Honcho: The Secret Weapon Top AI Engineers Use for Agent Memory
Honcho: The Secret Weapon Top AI Engineers Use for Agent Memory
Your AI agent just forgot everything. Again. That customer it's been helping for six months? Blank slate. The complex debugging session from yesterday? Gone. The personalized learning path it carefully constructed? Poof. You're not alone—every developer building with LLMs hits this wall. Vector databases dump chunks of text. RAG systems retrieve irrelevant snippets. Simple memory stores are just glorified key-value pairs with delusions of grandeur. But what if your agents could actually understand the people they interact with? What if they could reason about changing behaviors, evolving preferences, and complex relationships over time? That's not science fiction. That's Honcho—the memory infrastructure that's redefining what stateful agents can achieve.
What is Honcho?
Honcho is memory infrastructure for building stateful agents that understand changing people, agents, groups, projects, and ideas over time. Built by Plastic Labs, this open-source FastAPI server (licensed under AGPL-3.0) represents a fundamental shift from storage-first to reasoning-first memory. While most memory solutions treat conversations as searchable text dumps, Honcho extracts conclusions, builds psychological models, and maintains dynamic representations of every entity in your system.
The project is structured across multiple repositories, with this core repo housing the server logic and client SDKs living in dedicated directories. You can use it as a managed service at api.honcho.dev (with $100 free credits on signup) or self-host entirely. The current server version is 3.0.6, with mature Python and TypeScript SDKs available.
Why is Honcho trending now? Because the agent landscape has shifted. Simple tool-calling agents are everywhere—but agents that remember, that adapt, that build relationships? That's the competitive moat. Honcho has literally defined the Pareto Frontier of Agent Memory through rigorous benchmarking on LongMemEval, LoCoMo, and other long-conversation evals. When retention and trust become your differentiators, raw model intelligence stops being enough.
Key Features That Separate Honcho from the Herd
Reasoning-first memory architecture. Honcho doesn't just store messages—it processes them. A background "deriver" worker asynchronously analyzes interactions, extracts deductive and inductive conclusions, and updates peer representations. This isn't retrieval-augmented generation; it's comprehension-augmented generation.
Peer-centric modeling. Humans and AI agents are first-class citizens. Honcho tracks what each peer knows about others, enabling complex multi-agent scenarios where Agent A understands User B's preferences differently than Agent C does. The observation model is configurable—you control which peers observe which others.
Multi-perspective representations. Through internal collections keyed by (observer, observed) pairs, Honcho maintains distinct psychological models. Self-representation (observer == observed) and cross-peer modeling happen through the same unified mechanism.
Hybrid search with semantic depth. BM25 + vector search across workspaces, sessions, and peers—but the killer feature is querying natural-language insights through the Chat Endpoint. Ask "What learning styles does this user respond to?" and get a reasoning-grounded answer, not a pile of chunks.
Prompt-ready context injection. The session.context() method returns token-limited, summary-optimized bundles with .to_openai() and .to_anthropic() converters. Stop wrestling with context window management—Honcho does it.
MCP-native integrations. Claude Code, Cursor, Cline, Windsurf, OpenCode, OpenClaw, Hermes—Honcho plugs into your existing workflow without friction. The plugin architecture means your coding agents get persistent memory without code changes.
Use Cases Where Honcho Absolutely Dominates
1. Personalized Education & Tutoring
Your tutoring agent meets Alice. Six months later, it remembers she panics at timed tests, prefers visual proofs over algebraic manipulation, and finally clicked when you used cooking metaphors for fractions. Without Honcho? Every session starts from zero. With Honcho? alice.chat("What learning styles does the user respond to best?") returns actionable insights that transform engagement.
2. Complex Coding Agents & DevTools
Claude Code with Honcho remembers your codebase preferences, your recurring bugs, your architectural decisions. That MCP integration isn't just storage—it's evolving understanding. Migrate from legacy MEMORY.md files (OpenClaw does this non-destructively) and watch your agent anticipate needs before you articulate them.
3. Multi-Agent Collaboration Systems
Three agents negotiating resource allocation, each with distinct models of human stakeholders. Agent A knows the CTO prioritizes security; Agent B understands the CTO actually cares about velocity but can't admit it; Agent C tracks how the CTO's priorities shifted after the last breach. Honcho's peer observation model makes this tractable.
4. Long-Running Customer Relationships
Support agents that remember emotional context across years. The frustrated user from 2023 who became a champion? Your agent knows the journey. The enterprise client whose requirements evolved through three pivots? All captured in queryable conclusions, not buried in ticket archives.
5. Creative & Research Assistants
Writing partners that understand your evolving style. Research assistants that track your shifting hypotheses. The representation system captures not just facts, but how your thinking changes—enabling genuinely collaborative intelligence.
Step-by-Step Installation & Setup Guide
Managed Service (Fastest Path)
Head to app.honcho.dev, create an organization, and grab your API key. You'll get a dedicated Honcho instance with $100 in free credits. No infrastructure to manage.
Python SDK Installation
# Standard pip
pip install honcho-ai
# Modern Python tooling
uv add honcho-ai
# Or Poetry for dependency management
poetry add honcho-ai
TypeScript SDK Installation
# npm
npm install @honcho-ai/sdk
# Or bun for speed
bun add @honcho-ai/sdk
Self-Hosting with Docker (Full Control)
# Clone the repository
git clone https://github.com/plastic-labs/honcho.git
cd honcho
# Copy configuration templates
cp docker-compose.yml.example docker-compose.yml
cp .env.template .env
# Edit .env with your LLM provider keys:
# LLM_GEMINI_API_KEY=your_key # deriver, summary, dialectic minimal/low
# LLM_ANTHROPIC_API_KEY=your_key # dialectic medium/high/max, dream
# LLM_OPENAI_API_KEY=your_key # embeddings when EMBED_MESSAGES=true
# Launch everything
docker compose up
Point SDKs at your local instance:
honcho = Honcho(
workspace_id="my-app-testing",
base_url="http://localhost:8000" # or export HONCHO_URL
)
Local Development Setup (No Docker)
Requires Python ≥3.10 and uv ≥0.5.0:
cd honcho
uv sync # Creates .venv with dependencies
source .venv/bin/activate # Activate environment
# Database: use Supabase or local Docker
cp docker-compose.yml.example docker-compose.yml
docker compose up -d database # Postgres with pgvector
# Configure and migrate
cp .env.template .env # Fill in DB_CONNECTION_URI (postgresql+psycopg://...)
# Add LLM provider keys
uv run alembic upgrade head # Create all tables
# Launch services (two terminals)
uv run fastapi dev src/main.py # API server with hot reload
uv run python -m src.deriver # Background reasoning worker
The deriver is critical—it's what transforms stored messages into intelligence. Scale by running multiple deriver instances.
REAL Code Examples from the Repository
Example 1: The Complete Honcho Loop (Python)
This is the canonical pattern—store, reason, query, inject. Straight from the README, with detailed annotation:
import os
from honcho import Honcho
from openai import OpenAI
# Initialize with workspace isolation. Each workspace is a data silo
# for different apps, features, or tenants.
honcho = Honcho(
workspace_id="my-app-testing",
api_key=os.environ["HONCHO_API_KEY"],
# base_url="http://localhost:8000" # Uncomment for self-hosted
)
# STEP 1: STORE — Create peers and capture interaction history
# Peers are first-class: humans AND agents are the same primitive
alice = honcho.peer("alice") # The human student
tutor = honcho.peer("tutor") # The AI tutor agent
# Sessions are many-to-many: multiple peers, flexible participation
session = honcho.session("session-1")
# Messages are atomic, peer-labelled units. Honcho ingests these
# and queues them for background reasoning.
session.add_messages([
alice.message("Hey there — can you help me with my math homework?"),
tutor.message("Absolutely. Send me your first problem!"),
])
# STEP 2: REASON — Happens asynchronously via the deriver worker.
# Honcho extracts conclusions, updates representations, generates summaries.
# This is NOT immediate; new messages take moments to propagate.
# STEP 3: QUERY — Natural language insights or prompt-ready context
# The chat endpoint reasons over conclusions to answer questions
answer = alice.chat("What learning styles does the user respond to best?")
# context() returns token-managed, summary-optimized bundles
# summary=True collapses history; tokens=10_000 limits output
context = session.context(summary=True, tokens=10_000)
# STEP 4: INJECT — Drop into any model or framework
client = OpenAI()
completion = client.chat.completions.create(
model=os.environ.get("OPENAI_MODEL", "gpt-4o-mini"),
# to_openai() formats with proper role assignment
# assistant=tutor marks tutor's messages correctly
messages=context.to_openai(assistant=tutor),
)
What's happening under the hood? The add_messages() call persists to Postgres and enqueues derivation tasks. The deriver processes these, updating Alice's self-representation and the tutor's model of Alice. When alice.chat() fires, Honcho queries these representations through the dialectic pipeline (configurable reasoning levels: minimal/low/medium/high/max). The context() endpoint performs intelligent compression—keeping relevant conclusions, summaries, and recent messages within your token budget.
Example 2: TypeScript Equivalent with Async Patterns
import { Honcho } from "@honcho-ai/sdk";
import OpenAI from "openai";
// Same initialization pattern, fully typed
const honcho = new Honcho({
workspaceId: "my-app-testing",
apiKey: process.env.HONCHO_API_KEY,
});
// All operations are async/await native
const alice = await honcho.peer("alice");
const tutor = await honcho.peer("tutor");
const session = await honcho.session("session-1");
// Batch message ingestion
await session.addMessages([
alice.message("Hey there — can you help me with my math homework?"),
tutor.message("Absolutely. Send me your first problem!"),
]);
// Query with explicit typing on responses
const answer = await alice.chat(
"What learning styles does the user respond to best?"
);
// Context with options object, OpenAI-compatible output
const context = await session.context({ summary: true, tokens: 10_000 });
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
model: process.env.OPENAI_MODEL ?? "gpt-4o-mini",
messages: context.toOpenAI({ assistant: tutor }),
});
Key TypeScript advantage: Full type safety on context.toOpenAI() ensures your message format matches exactly what OpenAI expects—no runtime surprises.
Example 3: MCP Integration for Claude Code (Instant Memory)
The fastest way to give existing agents persistent memory. No codebase changes required:
# Rich plugin integration (recommended for Claude Code)
/plugin marketplace add plastic-labs/claude-honcho
/plugin install honcho@honcho
# Raw MCP for any compatible client: Cursor, Cline, Windsurf, etc.
claude mcp add honcho \
--transport http \
--url "https://mcp.honcho.dev" \
--header "Authorization: Bearer hch-your-key-here" \
--header "X-Honcho-User-Name: YourName"
Why this matters: Your Claude Code sessions now persist across conversations. It remembers your project structure preferences, your debugging patterns, your architectural decisions. The X-Honcho-User-Name header enables multi-user isolation—team members get distinct memory spaces.
Example 4: OpenClaw Migration (Non-Destructive)
# Install the plugin
openclaw plugins install @honcho-ai/openclaw-honcho
# Interactive setup: prompts for API key, writes config,
# optionally migrates legacy memory files
openclaw honcho setup
# Force gateway restart to load new configuration
openclaw gateway --force
Critical detail: Original MEMORY.md, USER.md, IDENTITY.md files are never deleted. Honcho ingests and enhances—no destructive operations. This is production-safe migration.
Example 5: Configuration Flexibility (TOML + Environment)
# Copy example configuration
cp config.toml.example config.toml
# config.toml - Organized by subsystem
[app]
LOG_LEVEL = "INFO"
SESSION_LIMIT = 1000
[db]
CONNECTION_URI = "postgresql+psycopg://localhost/honcho_dev"
POOL_SIZE = 10
[deriver]
# Background worker settings for representation generation
[dialectic]
# Chat endpoint with per-level reasoning configuration
# low: fast, minimal reasoning
# max: deep, comprehensive analysis
Override in production without touching files:
# Environment variables take highest priority
export DB_CONNECTION_URI="postgresql+psycopg://prod/honcho"
export DIALECTIC_LEVELS__high__MODEL_CONFIG__MODEL="claude-3-5-sonnet-20241022"
export DERIVER_MODEL_CONFIG__TRANSPORT="anthropic"
The __ separator enables nested overrides. This hierarchy—env > .env > config.toml > defaults—means you can commit safe defaults and inject secrets at runtime.
Advanced Usage & Best Practices
Scale derivers horizontally. The background worker is your bottleneck. Run uv run python -m src.deriver in multiple processes to increase throughput. Monitor honcho.queue_status() to detect backlog.
Use representations for low-latency paths. peer.representation() returns cached, static snapshots—no LLM call required. Perfect for prompt injection where speed matters more than freshness.
Configure observation scopes carefully. By default, peers observe all session participants. For sensitive multi-tenant scenarios, restrict observation to prevent information leakage between users.
Leverage the conclusions API directly. While peer.chat() is convenient, the raw conclusions endpoint gives you structured data for custom reasoning pipelines.
Token budget aggressively. session.context(tokens=10_000) is your friend. Honcho's summarization is optimized for relevance—trust the compression and save money on LLM calls.
Embed documents for RAG hybridity. session.upload_file() ingests documents into the vector store, making them searchable alongside conversational memory. This bridges explicit knowledge bases with implicit learned representations.
Comparison with Alternatives
| Capability | Honcho | Vector DB (Pinecone/Weaviate) | Simple Memory (LangChain) | Redis/Key-Value |
|---|---|---|---|---|
| Reasoning over time | ✅ Native background deriver | ❌ None | ❌ Manual chunking | ❌ None |
| Peer psychology model | ✅ First-class entity tracking | ❌ Flat embeddings | ❌ No entity concept | ❌ Key-value only |
| Multi-perspective | ✅ Observer/observed pairs | ❌ Single embedding space | ❌ No cross-peer modeling | ❌ No |
| Natural language queries | ✅ Chat endpoint with reasoning | ❌ Similarity search only | ⚠️ Requires LLM wrapper | ❌ No |
| Prompt-ready context | ✅ Built-in token management | ❌ Manual assembly | ⚠️ Basic truncation | ❌ Manual |
| MCP/agent integrations | ✅ Native plugins | ❌ None | ⚠️ Framework-dependent | ❌ No |
| Self-hostable | ✅ AGPL-3.0, Docker, source | Varies | ✅ | ✅ |
| Managed option | ✅ api.honcho.dev | ✅ | ❌ | ✅ |
The verdict: Vector databases store. Honcho understands. LangChain memory is a wrapper layer—you're still doing the cognitive work. Honcho's deriver pipeline is the differentiator: autonomous, background, psychology-informed reasoning that compounds over time.
FAQ
Q: How does Honcho differ from RAG? A: RAG retrieves chunks. Honcho extracts conclusions and builds dynamic psychological models. Ask "What's this user's biggest frustration?"—RAG returns mentions of "frustrated"; Honcho returns synthesized insights from patterns across months.
Q: Is my data safe with the managed service? A: Workspaces provide strict isolation. For maximum control, self-host under AGPL-3.0—full source, full auditability, full data sovereignty.
Q: What LLM providers does Honcho support? A: Configurable: Gemini (default for deriver/summary/low reasoning), Anthropic (medium/high/max reasoning, dream), OpenAI (embeddings). Mix and match per subsystem.
Q: How fast is the reasoning?
A: Background processing is asynchronous. For immediate needs, use representation() endpoints (cached, low-latency) or configure faster dialectic levels. Fresh messages may take seconds to propagate.
Q: Can I migrate existing memory systems?
A: OpenClaw integration non-destructively migrates MEMORY.md/USER.md/IDENTITY.md. The npx skills add tool generates SDK integration code for custom migrations.
Q: What's the pricing for managed? A: $100 free credits on signup. Beyond that, usage-based pricing at app.honcho.dev.
Q: How do I debug background processing?
A: honcho.queue_status() exposes the derivation queue. Logs from the deriver worker show task processing. Increase LOG_LEVEL to DEBUG for visibility.
Conclusion
The agents that win won't be the ones with the biggest context windows or the flashiest tool chains. They'll be the ones that remember—that build genuine understanding of the humans and systems they interact with over months and years. Honcho isn't just storage infrastructure; it's cognitive infrastructure.
I've evaluated every memory solution in this space. Nothing else combines reasoning-first architecture, peer-centric modeling, and production-ready integrations this seamlessly. The Pareto Frontier claim isn't marketing—it's benchmarked, reproducible, and publicly documented.
Stop building forgetful agents. Stop treating memory as an afterthought. Your competitive moat is waiting.
Get started now: Sign up for $100 free credits · Star the repo · Read the evals
The future of AI isn't just intelligent. It's remembered.
Comments (0)
No comments yet. Be the first to share your thoughts!