Architecture
This page covers how Samantha LLM works under the hood. You don’t need any of this to use it — this is for the curious.
The Big Picture
Samantha LLM is a Python script with no external dependencies beyond the standard library. It orchestrates three things:
- Bootstrap — Assembles a prompt and hands it to your AI agent
- Memory — Maintains structured knowledge across sessions
- Subconscious — Automatically extracts insights after each session
Everything is plain text. Memories are Markdown. Indexes are JSON. Config is JSON. There’s no database, no daemon, no background service (except the post-session worker, which exits when done).
Session Lifecycle
samantha-llm start
|
v
Read config.json (agents, personas, defaults)
|
v
Create .ai-cerebrum symlink → samantha-llm repo
|
v
Check for unmerged subconscious sessions
|
v
Assemble bootstrap prompt (template + persona)
|
v
Launch agent with prompt as argument
|
v
--- You work normally ---
|
v
Session ends → symlink cleaned up
|
v
Background worker launches (subconscious pipeline)
|
v
Memories ready for next session
Persona Injection
The bootstrap prompt is assembled from two pieces:
- Template (
BOOTSTRAP_PROMPT.md) — Fixed instructions for initialization, memory loading, and core behaviors - Persona file (
persona/main.mdor a custom file) — Who the assistant is
At startup, the persona content replaces a {PERSONA_CONTENT_HERE} placeholder
in the template. The combined prompt is passed as an argument to your agent’s
command. From the agent’s perspective, it’s just a very detailed system prompt.
BOOTSTRAP_PROMPT.md persona/main.md
| |
+--- merge at placeholder --+
|
v
Full bootstrap prompt
|
v
claude "$(prompt)"
The Workspace Symlink
Rather than copying files into every project, Samantha LLM creates a symlink:
your-project/.ai-cerebrum → /path/to/samantha-llm/
This gives the agent access to the bootstrap prompt, persona files, memory directories, and core processes — all from a single repository. Multiple projects share the same symlink target, which means they share the same memories and persona.
Auto-created symlinks are removed when the session ends. Manual links
(samantha-llm link) persist until explicitly removed.
Memory Architecture
Directory Layout
.ai/
├── short-term-memory/.ai/ 30-90 day lifecycle
├── long-term-memory/.ai/ Permanent
├── current-tasks/.ai/ Active projects
├── work-experience/.ai/ Completed project archive
├── procedural-memory/.ai/ Operational runbooks
└── subconscious/ Session processing workspace
Index-First Loading
Each memory directory maintains an index.json that categorizes memories by
importance, topic, and recency. During bootstrap, the assistant reads indexes
first, then selectively loads only the memories relevant to the current session.
This keeps startup fast regardless of how many memories exist.
Bootstrap Loading Order
- Critical memories — Flagged
critical: true, loaded every session - High-priority — Frequently referenced (
reference_count >= 5) - Project-specific — Tagged with the current project
- Recent high-importance —
importance: highfrom the last 30 days - Procedural — Runbooks matching the current context
- Long-term — Accessed on-demand during the session
Importance Escalation
Memory importance increases automatically as memories prove useful:
| References | Action |
|---|---|
| 3+ | Candidate for importance: high |
| 5+ | Candidate for long-term memory transfer |
| 10+ | Should definitely be in long-term memory |
Low-importance memories older than 30 days are pruned. High-importance ones are promoted to long-term storage where they persist indefinitely.
Subconscious Pipeline
After each session, a background worker analyzes the conversation and generates structured memories. The pipeline has six phases.
Phase 1: Transcript Capture
Terminal I/O is recorded during the session. When the session ends, the recording is saved and the background worker launches as a detached process.
Phase 2: Chunking
Conversations longer than ~150K characters are split at natural boundaries:
- Tool execution results
- File operation completions
- Test execution output
- Topic transitions
- Paragraph breaks
Each chunk receives a summary of previous chunks for context continuity. This means conversations of any length can be processed.
Phase 3: LLM Analysis
Each chunk is sent to an LLM (Anthropic API preferred, Claude CLI as fallback) with a structured analysis prompt. The prompt instructs the LLM to extract:
- Patterns — Recurring workflows and common tasks
- Decisions — Technical choices and their rationale
- Learnings — New discoveries and insights
- Preferences — Working style and tool choices
- TODOs — Action items identified during the session
Phase 4: Parsing and Merging
Raw LLM output is parsed into structured data. For chunked conversations, results from all chunks are merged and deduplicated — a decision mentioned in chunk 2 and chunk 5 appears only once in the final output.
Phase 5: Memory Generation
Parsed results are converted into a memory file with proper YAML frontmatter. Importance is assessed automatically based on insight density — sessions with many decisions score higher than sessions with routine work.
Phase 6: Procedural Extraction
A lightweight secondary analysis identifies operational patterns that could become runbooks. This runs only when the main analysis produces meaningful results (>500 characters).
Session Isolation
Each session gets its own workspace under .ai/subconscious/.ai/sessions/.
Multiple concurrent sessions don’t interfere with each other. Workspaces are
archived after a successful merge.
Procedural Memory Triggers
Runbooks load automatically when context signals match. To prevent false positives, the system requires multiple signals from different categories:
| Category | Examples |
|---|---|
| repo_signals | Repository name |
| path_signals | File paths being edited |
| keyword_signals | Terms in conversation |
| domain_signals | Broader context phrases |
Rules:
- At least 2 positive signals from different categories required
- Any matching negative signal vetoes the runbook
- Corrections are recorded and propagated to prevent repeat false matches
This multi-signal approach prevents a runbook about “developing framework X” from loading every time someone merely uses framework X.
Ralph Mode
Ralph mode is an opt-in iterative coding loop for tasks with clear, machine-verifiable success criteria — test suites passing, linting clean, benchmarks met.
Named after Ralph Wiggum (iteration over perfection), it follows a cycle:
Work → Evaluate → Document → Decide → Monitor
^ |
+--------------------------------------+
Key design principles:
- State lives in files, not context — Progress, guardrails, and learnings are written to disk so a fresh agent can pick up where the last one left off
- Guardrails accumulate — When a mistake repeats, it becomes a guardrail that prevents future iterations from making the same error
- Context rotation — When token usage gets high, the agent wraps up, writes a memory, and requests a fresh session rather than degrading
- Explicit activation — Ralph mode is never entered automatically; the user must request it
Guardrails
Stored in .ai/ralph-guardrails/, guardrails are learned constraints:
## Don't Use pip
**Context:** Docker image build kept failing
### What Failed
Used `pip install` — project standard is `uv`
### Correct Approach
Always use `uv pip install --system`
Guardrails are loaded at the start of each Ralph iteration and grow over time. They’re the system’s way of not making the same mistake twice.
Design Principles
A few principles that shaped the architecture:
- Plain text everywhere — Markdown and JSON. No proprietary formats, no
binary blobs. You can read, edit, or
grepanything. - No dependencies — The core is a single Python script using only the standard library. QMD is optional.
- Agent-agnostic — Any LLM tool that accepts a text prompt works. The system doesn’t depend on any specific agent’s features.
- Graceful degradation — No QMD? Search falls back to index scanning. No API key? Analysis falls back to CLI. Nothing is hard-required except Python and an agent.
- Human-readable state — Every piece of state is a file you can open in a text editor. If the tooling breaks, you can still read your memories.
Next Steps
- Configuration — Where all these files live on disk
- Memory System — Memory types and file format
- Subconscious System — Session processing commands
- CLI Reference — All commands