Architecture

This page covers how Samantha LLM works under the hood. You don’t need any of this to use it — this is for the curious.

The Big Picture

Samantha LLM is a Python script with no external dependencies beyond the standard library. It orchestrates three things:

Bootstrap — Assembles a prompt and hands it to your AI agent
Memory — Maintains structured knowledge across sessions
Subconscious — Automatically extracts insights after each session

Everything is plain text. Memories are Markdown. Indexes are JSON. Config is JSON. There’s no database, no daemon, no background service (except the post-session worker, which exits when done).

Session Lifecycle

samantha-llm start
       |
       v
Read config.json (agents, personas, defaults)
       |
       v
Create .ai-cerebrum symlink → samantha-llm repo
       |
       v
Check for unmerged subconscious sessions
       |
       v
Assemble bootstrap prompt (template + persona)
       |
       v
Launch agent with prompt as argument
       |
       v
 --- You work normally ---
       |
       v
Session ends → symlink cleaned up
       |
       v
Background worker launches (subconscious pipeline)
       |
       v
Memories ready for next session

Persona Injection

The bootstrap prompt is assembled from two pieces:

Template (BOOTSTRAP_PROMPT.md) — Fixed instructions for initialization, memory loading, and core behaviors
Persona file (persona/main.md or a custom file) — Who the assistant is

At startup, the persona content replaces a {PERSONA_CONTENT_HERE} placeholder in the template. The combined prompt is passed as an argument to your agent’s command. From the agent’s perspective, it’s just a very detailed system prompt.

BOOTSTRAP_PROMPT.md          persona/main.md
        |                          |
        +--- merge at placeholder --+
        |
        v
  Full bootstrap prompt
        |
        v
  claude "$(prompt)"

The Workspace Symlink

Rather than copying files into every project, Samantha LLM creates a symlink:

your-project/.ai-cerebrum  →  /path/to/samantha-llm/

This gives the agent access to the bootstrap prompt, persona files, memory directories, and core processes — all from a single repository. Multiple projects share the same symlink target, which means they share the same memories and persona.

Auto-created symlinks are removed when the session ends. Manual links (samantha-llm link) persist until explicitly removed.

Memory Architecture

Directory Layout

.ai/
├── short-term-memory/.ai/     30-90 day lifecycle
├── long-term-memory/.ai/      Permanent
├── current-tasks/.ai/         Active projects
├── work-experience/.ai/       Completed project archive
├── procedural-memory/.ai/     Operational runbooks
└── subconscious/              Session processing workspace

Index-First Loading

Each memory directory maintains an index.json that categorizes memories by importance, topic, and recency. During bootstrap, the assistant reads indexes first, then selectively loads only the memories relevant to the current session.

This keeps startup fast regardless of how many memories exist.

Bootstrap Loading Order

Critical memories — Flagged critical: true, loaded every session
High-priority — Frequently referenced (reference_count >= 5)
Project-specific — Tagged with the current project
Recent high-importance — importance: high from the last 30 days
Procedural — Runbooks matching the current context
Long-term — Accessed on-demand during the session

Importance Escalation

Memory importance increases automatically as memories prove useful:

References	Action
3+	Candidate for `importance: high`
5+	Candidate for long-term memory transfer
10+	Should definitely be in long-term memory

Low-importance memories older than 30 days are pruned. High-importance ones are promoted to long-term storage where they persist indefinitely.

Subconscious Pipeline

After each session, a background worker analyzes the conversation and generates structured memories. The pipeline has six phases.

Phase 1: Transcript Capture

Terminal I/O is recorded during the session. When the session ends, the recording is saved and the background worker launches as a detached process.

Phase 2: Chunking

Conversations longer than ~150K characters are split at natural boundaries:

Tool execution results
File operation completions
Test execution output
Topic transitions
Paragraph breaks

Each chunk receives a summary of previous chunks for context continuity. This means conversations of any length can be processed.

Phase 3: LLM Analysis

Each chunk is sent to an LLM (Anthropic API preferred, Claude CLI as fallback) with a structured analysis prompt. The prompt instructs the LLM to extract:

Patterns — Recurring workflows and common tasks
Decisions — Technical choices and their rationale
Learnings — New discoveries and insights
Preferences — Working style and tool choices
TODOs — Action items identified during the session

Phase 4: Parsing and Merging

Raw LLM output is parsed into structured data. For chunked conversations, results from all chunks are merged and deduplicated — a decision mentioned in chunk 2 and chunk 5 appears only once in the final output.

Phase 5: Memory Generation

Parsed results are converted into a memory file with proper YAML frontmatter. Importance is assessed automatically based on insight density — sessions with many decisions score higher than sessions with routine work.

Phase 6: Procedural Extraction

A lightweight secondary analysis identifies operational patterns that could become runbooks. This runs only when the main analysis produces meaningful results (>500 characters).

Session Isolation

Each session gets its own workspace under .ai/subconscious/.ai/sessions/. Multiple concurrent sessions don’t interfere with each other. Workspaces are archived after a successful merge.

Procedural Memory Triggers

Runbooks load automatically when context signals match. To prevent false positives, the system requires multiple signals from different categories:

Category	Examples
repo_signals	Repository name
path_signals	File paths being edited
keyword_signals	Terms in conversation
domain_signals	Broader context phrases

Rules:

At least 2 positive signals from different categories required
Any matching negative signal vetoes the runbook
Corrections are recorded and propagated to prevent repeat false matches

This multi-signal approach prevents a runbook about “developing framework X” from loading every time someone merely uses framework X.

Ralph Mode

Ralph mode is an opt-in iterative coding loop for tasks with clear, machine-verifiable success criteria — test suites passing, linting clean, benchmarks met.

Named after Ralph Wiggum (iteration over perfection), it follows a cycle:

Work → Evaluate → Document → Decide → Monitor
  ^                                      |
  +--------------------------------------+

Key design principles:

State lives in files, not context — Progress, guardrails, and learnings are written to disk so a fresh agent can pick up where the last one left off
Guardrails accumulate — When a mistake repeats, it becomes a guardrail that prevents future iterations from making the same error
Context rotation — When token usage gets high, the agent wraps up, writes a memory, and requests a fresh session rather than degrading
Explicit activation — Ralph mode is never entered automatically; the user must request it

Guardrails

Stored in .ai/ralph-guardrails/, guardrails are learned constraints:

## Don't Use pip

**Context:** Docker image build kept failing

### What Failed
Used `pip install` — project standard is `uv`

### Correct Approach
Always use `uv pip install --system`

Guardrails are loaded at the start of each Ralph iteration and grow over time. They’re the system’s way of not making the same mistake twice.

Design Principles

A few principles that shaped the architecture:

Plain text everywhere — Markdown and JSON. No proprietary formats, no binary blobs. You can read, edit, or grep anything.
No dependencies — The core is a single Python script using only the standard library. QMD is optional.
Agent-agnostic — Any LLM tool that accepts a text prompt works. The system doesn’t depend on any specific agent’s features.
Graceful degradation — No QMD? Search falls back to index scanning. No API key? Analysis falls back to CLI. Nothing is hard-required except Python and an agent.
Human-readable state — Every piece of state is a file you can open in a text editor. If the tooling breaks, you can still read your memories.

Next Steps

Configuration — Where all these files live on disk
Memory System — Memory types and file format
Subconscious System — Session processing commands
CLI Reference — All commands

Samantha LLM — Persistent Memory for AI Assistants

Documentation and website for samantha-llm