Samantha LLM — Persistent Memory for AI Assistants

Documentation and website for samantha-llm

View on GitHub

Memory Search

Samantha LLM works without search — memories are loaded by priority during bootstrap. But as your memory collection grows, semantic search lets you (and your assistant) find exactly the right context fast.

Memory search is powered by QMD, a local semantic search engine. Everything runs on your machine — no API calls, no cloud indexing.

Installing QMD

samantha-llm qmd install

This installs the Bun runtime and the QMD search engine. AI models (~2GB) are downloaded automatically on first use.

# Check installation status
samantha-llm qmd status

# Quick check (exit code 0 if installed)
samantha-llm qmd check

Indexing Your Memories

Before searching, index your memory files:

samantha-llm memories index

This indexes all Markdown files across your memory directories:

Only changed files are re-indexed on subsequent runs, so indexing stays fast as your collection grows.

Searching

From the Command Line

samantha-llm memories search "authentication decision"

Results show matching memory excerpts with context, ranked by relevance.

Options:

Flag Effect
-n 20 Return more results (default: 10)
--json JSON output for scripting
--text Plain text output

During Sessions

Your assistant can search memories during a session using the same command. This is useful when a question comes up that might be answered by a past decision or learning — the assistant searches, reads the relevant memories, and incorporates that context into the conversation.

Search Modes

QMD supports three search modes, each with different trade-offs:

Hybrid (Default)

Combines keyword matching, semantic understanding, and AI re-ranking for the best results. This is what runs when you use samantha-llm memories search.

Keyword

BM25 full-text search. Fastest option — good when you know the exact terms you’re looking for.

Semantic

Vector embedding similarity. Finds conceptually related memories even when the wording is different — useful for questions like “what did we decide about deployment?” when the memory uses words like “release process.”

Without QMD

QMD is optional. Without it:

The main thing you lose is the ability to search across hundreds of memories by meaning rather than filename. For small memory collections, you may not need it at all.

Models

QMD uses three local AI models, downloaded to ~/.cache/qmd/models/ on first use:

Model Size Purpose
Embedding ~300MB Converts text to vectors for semantic matching
Re-ranking ~640MB Scores and orders candidate results
Query expansion ~1.1GB Reformulates queries for better recall

All models run locally. No data leaves your machine.

Next Steps