Stop Losing AI Agent Context: Chorus Is the Harness You Need

Your AI agent just crashed mid-deployment. Three hours of context evaporated. The half-finished migration script? Gone. The carefully crafted prompt chain? Scattered across five terminal windows. You're staring at a blank screen, wondering whether to rebuild from scratch or admit defeat and call it a night.

Sound familiar? Here's the brutal truth: raw LLM agents are fragile. They start strong, execute brilliantly for twenty minutes, then die silently—taking your entire session state, task dependencies, and hard-won progress with them. No recovery. No audit trail. Just you, alone, picking up the pieces.

But what if your agents had infrastructure? Not just another wrapper, but a genuine harness that manages their entire lifecycle—session persistence, heartbeat monitoring, automatic failure recovery, and seamless handoffs between AI and human collaborators?

Enter Chorus, the open-source agent harness that transforms chaotic AI agent execution into a structured, observable, recoverable workflow. Born from the AI-Driven Development Lifecycle (AI-DLC) methodology pioneered by AWS, Chorus implements a radical philosophy: Reversed Conversation—AI proposes, humans verify. No more blind trust in autonomous agents. No more catastrophic context loss. Just reliable, collaborative AI-human development at scale.

Ready to stop babysitting your agents? Let's dive deep into what makes Chorus the infrastructure layer AI-native teams are quietly adopting.

What Is Chorus?

Chorus is an agent harness—the foundational infrastructure that wraps around Large Language Model (LLM) agents to manage their complete operational lifecycle. Think of it as Kubernetes for AI agents: not the agent itself, but the orchestration layer that makes multi-agent systems production-ready.

Created by the Chorus-AIDLC organization and inspired directly by AWS's AI-Driven Development Lifecycle methodology, Chorus addresses a critical gap in the current AI tooling landscape. While frameworks like LangChain and CrewAI help you build agents, Chorus helps you run them reliably in collaborative environments where humans and multiple AI systems must work together toward shared deliverables.

The project is trending now for three converging reasons:

The "agent winter" backlash—teams burned by unreliable autonomous agents are demanding observability and recovery
AI-DLC adoption—enterprise engineering organizations are formalizing AI-human collaboration workflows
The v0.7.0 permissions breakthrough—fine-grained 5×3 resource-action grids replaced crude role-based access, enabling secure multi-tenant agent deployments

At its core, Chorus enforces structured workflow stages: Idea → Proposal → [Document + Task DAG] → Execute → Verify → Done. Each stage has clear ownership—PM agents analyze and plan, dev agents execute with code, admin humans review and verify. This isn't theoretical; it's the exact pattern AWS documented for scaling AI-driven development in regulated environments.

The "Reversed Conversation" philosophy flips dangerous "auto-pilot" narratives. AI drives proposal and execution, but humans retain veto power at critical gates. This isn't slowing teams down—it's preventing the catastrophic errors that destroy trust in AI tooling and create organizational resistance to adoption.

Key Features That Separate Chorus from Agent Frameworks

Chorus isn't another agent construction kit. It's operational infrastructure with capabilities that only matter when you're running agents in production:

Session Lifecycle Management with Automatic Recovery

Chorus maintains persistent sessions with heartbeat monitoring and auto-expiry. When an agent disconnects—network blip, API rate limit, or outright crash—the session state persists. On reconnection, the agent resumes exactly where it left off, with full context intact. The harness handles the messy reality that "stateless" agent designs ignore: long-running tasks fail mid-execution, and recovery must be automatic, not manual.

Task DAG with Dependency Modeling and Cycle Detection

Real projects aren't linear. Chorus models tasks as a Directed Acyclic Graph with explicit dependencies, parallel execution paths, and interactive visualization. The system detects cycles before execution—preventing infinite loops that plague naive agent orchestration—and optimizes scheduling based on dependency resolution.

Fine-Grained Agent Permissions (v0.7.0+)

The breakthrough feature replacing crude PM/Developer/Admin trinities. Chorus implements a 5 resources × 3 actions permission grid: resources include Ideas, Proposals, Tasks, Documents, and Projects; actions cover Read, Write, and Admin operations. Presets provide quick configuration, but Custom combinations enable precise least-privilege access. This matters critically when you're running untrusted third-party agents or isolating customer data in multi-tenant deployments.

Multi-Agent Collaboration with Swarm Mode

Chorus supports Claude Code Agent Teams operating in parallel execution mode. Multiple specialized agents—PM, developer, reviewer—coordinate through shared state rather than passing messages ad-hoc. The /yolo skill (v0.6.1+) automates the full AI-DLC pipeline: Idea → Proposal → Execute → Verify with parallel agent team execution.

Real-Time Observability and Presence

The Pixel Workspace displays agent activity as animated pixel characters with live terminal output streaming. Kanban boards update automatically as agents progress tasks through To Do → In Progress → Verify. This isn't vanity UI—it's operational visibility that lets human supervisors intervene before agents derail, not after.

Embedded PGlite for Zero-Dependency Deployment

Chorus ships with embedded PostgreSQL via PGlite—no Docker, no external database, no configuration files. Single-command deployment that still migrates schema automatically. For production scale, switch to external PostgreSQL via environment variable—no code changes required.

Real-World Use Cases Where Chorus Transforms AI Operations

1. Regulated Enterprise Software Development

Financial services and healthcare organizations can't let AI agents autonomously deploy code. Chorus enforces human verification gates at Proposal and Verify stages, with complete audit trails showing exactly which agent proposed what change, when, and with what evidence. The activity stream provides session attribution for compliance reviews that raw agent logs cannot.

2. Multi-Agent Parallel Feature Development

A platform team needs to ship three independent features simultaneously. Chorus assigns each feature to a dedicated agent team with isolated permissions, manages their task DAGs independently, and surfaces conflicts through the universal search system. PM agents draft proposals in parallel; admin humans batch-review; dev agents execute approved workstreams concurrently.

3. 24/7 Automated Operations with Human Escalation

SRE teams configure Chorus agents with read-only production access and write access to runbooks. Agents monitor, diagnose, and propose fixes autonomously—but Chorus requires human admin verification before any mutating action. The session recovery ensures that if an agent's connection drops during a 3 AM incident, another agent or human can resume the exact diagnostic context.

4. Cross-Functional Requirements Elaboration

Before any code is written, Chorus PM agents conduct structured Q&A rounds with stakeholders through the Idea panel. Requirements get clarified, categorized, and tagged—preventing the "build the wrong thing" failure mode that wastes 40% of traditional development effort. The elaboration history persists as structured data, not lost in Slack threads.

5. OpenSpec-Compliant Design Documentation

For teams adopting the OpenSpec standard, Chorus v0.8.0+ enables PM agents to author proposals in structured proposal.md + design.md + specs/<capability>/spec.md layouts. Reviewers see predictable shapes (## ADDED Requirements, ### Requirement:, #### Scenario:) instead of parsing free-form Markdown. Local files remain the working copy; Chorus mirrors to documentDrafts for collaboration.

Step-by-Step Installation & Setup Guide

Chorus prioritizes radical simplicity for initial deployment while scaling to production complexity. Choose your path:

The Two-Command Local Start (No Dependencies)

npm install -g @chorus-aidlc/chorus
chorus

That's it. Chorus starts with embedded PGlite (embedded PostgreSQL), runs migrations automatically, and opens at http://localhost:8637.

Default credentials: admin@chorus.local / chorus

Critical Note: PGlite is single-process embedded PostgreSQL—excellent for individual developers, but its connection handling saturates under concurrent load. For multiple simultaneous agents or users, use external PostgreSQL.

Configuration Options

# Custom port for conflicting services
chorus --port 3000

# Custom data directory (default: ~/.chorus-data)
chorus --data-dir /path/to/data

# Override default credentials for team environments
DEFAULT_USER=me@example.com DEFAULT_PASSWORD=secret chorus

# Production-grade external database
DATABASE_URL=postgresql://user:pass@host:5432/chorus chorus

Docker Deployment (Recommended for Teams)

Standalone with embedded PostgreSQL:

git clone https://github.com/Chorus-AIDLC/chorus.git
cd chorus

DEFAULT_USER=admin@example.com DEFAULT_PASSWORD=changeme \
  docker compose -f docker-compose.local.yml up -d

Full production stack (PostgreSQL + Redis + Chorus):

DEFAULT_USER=admin@example.com DEFAULT_PASSWORD=changeme \
  docker compose up -d

Redis enables SSE push notifications and horizontal scaling via stateless MCP instances.

AWS Production Deployment

./install.sh

The interactive installer provisions: VPC, Aurora Serverless v2, ElastiCache Serverless, ECS Fargate, and ALB with HTTPS. Configuration saves to default_deploy.sh for reproducible re-deploys.

Development Environment

With Docker for databases:

cp .env.example .env
pnpm docker:db      # Starts PostgreSQL + Redis containers
pnpm install
pnpm db:migrate:dev
pnpm dev            # http://localhost:8637

Without Docker (embedded PGlite on port 5433):

cp .env.example .env
pnpm install
pnpm dev:local      # Dev server with embedded database

Data stored in .pglite/—delete directory to reset completely.

REAL Code Examples from the Repository

Chorus ships with extensive, production-tested code patterns. Here are critical examples extracted directly from the repository documentation:

Example 1: The Complete AI-DLC Workflow

The foundational workflow that Chorus orchestrates, directly from the README:

Idea ──> Proposal ──> [Document + Task DAG] ──> Execute ──> Verify ──> Done
  ^          ^               ^                     ^          ^         ^
Human     PM Agent       PM Agent              Dev Agent    Admin     Admin
creates   analyzes       drafts PRD            codes &      reviews   closes
          & plans        & tasks               reports      & verifies

Explanation: This ASCII diagram reveals Chorus's core orchestration pattern. Notice the bidirectional arrows at Proposal and Verify stages—these are human veto gates, not passive handoffs. The PM Agent cannot proceed to Execute without Admin approval; the Dev Agent cannot mark Done without Admin verification. This "Reversed Conversation" pattern prevents the autonomous agent runaway that destroys production systems.

Example 2: NPM One-Click Installation Command

From the Quick Start section, the deployment command that eliminates setup friction:

npm install -g @chorus-aidlc/chorus
chorus

Explanation: The global installation pattern (-g flag) makes Chorus available as a system command. The package name @chorus-aidlc/chorus uses npm organization scoping for namespace protection. Running chorus without arguments triggers automatic: (1) PGlite embedded database startup on port 5433, (2) Prisma schema migration execution, (3) Next.js application server startup on port 8637, and (4) browser auto-open. This single-command deployment rivals the simplicity of npx create-next-app but for full-stack agent infrastructure.

Example 3: Docker Compose with Environment Configuration

The production-recommended deployment with credential injection:

DEFAULT_USER=admin@example.com DEFAULT_PASSWORD=changeme \
  docker compose -f docker-compose.local.yml up -d

Explanation: Environment variable prefixing (DEFAULT_USER=... before command) injects configuration without file modification—critical for CI/CD pipelines and secret management. The -f docker-compose.local.yml flag selects the embedded-PGlite variant; omitting -f uses docker-compose.yml with external PostgreSQL and Redis. The -d daemonizes containers for persistent background operation. This pattern enables identical deployment logic across local development, staging, and production with only compose file and environment variable changes.

Example 4: External PostgreSQL Connection String

The production database configuration pattern:

DATABASE_URL=postgresql://user:pass@host:5432/chorus chorus

Explanation: Standard PostgreSQL connection URI format with environment-variable override. Chorus detects DATABASE_URL presence and skips PGlite initialization, connecting directly to the specified server. This enables: connection pooling via PgBouncer, read replicas for scaling, managed database services (RDS, Cloud SQL, AlloyDB), and credential rotation without application restart. The chorus command suffix executes after environment setup, maintaining identical CLI interface regardless of database backend.

Example 5: Custom Port and Data Directory Configuration

Extended CLI options for specialized deployments:

# Custom port for reverse proxy or port-conflicted environments
chorus --port 3000

# Custom data directory for persistent volume mounting
chorus --data-dir /path/to/data

Explanation: The --port option enables Chorus behind reverse proxies (nginx, Traefik, AWS ALB) or alongside other services. The --data-dir option is essential for containerized deployments where ephemeral filesystems destroy data on restart—mount this to persistent volumes (EBS, EFS, host bind-mounts). These flags demonstrate Chorus's twelve-factor app compliance: configuration via environment and flags, not baked into images or source code.

Advanced Usage & Best Practices

Permission Grid Optimization

Don't default to Admin presets. The 5×3 resource-action grid enables precise capability restrictions:

Read-only analysts: Read on Ideas, Proposals, Documents; no Task or Project access
Scoped developers: Read/Write on Tasks and Documents; Read-only on Proposals they execute
External contractors: Custom combinations with time-bounded API keys

Review docs/PERMISSIONS.md for the complete matrix and inheritance rules.

Session Recovery Tuning

Default heartbeat intervals suit stable networks. For mobile or unreliable connections:

# Reduce heartbeat interval (milliseconds)
CHORUS_HEARTBEAT_MS=5000 chorus

Shorter intervals detect failures faster but increase server load. Monitor src/app/api/notifications/README.md for SSE push reliability metrics.

MCP Tool Permission Gating

Chorus exposes 50+ MCP tools at /api/mcp with automatic permission enforcement. Agents receive 403 Forbidden for unauthorized tool calls—fail closed, not open. Audit agent tool usage patterns through the Activity Stream to identify permission misconfigurations or anomalous behavior.

OpenSpec Mode for Structured Proposals

Enable structured design documentation when openspec CLI is installed:

# Explicitly enable (auto-detected when available)
CHORUS_OPENSPEC_MODE=on chorus

# Force disable if detected but unwanted
CHORUS_OPENSPEC_MODE=off chorus

This creates predictable review surfaces and enables automated specification validation without schema changes.

Comparison with Alternatives

Capability	Chorus	LangChain	CrewAI	AutoGPT	Raw MCP
Session Persistence	✅ Native with recovery	❌ Manual	❌ Manual	❌ Ephemeral	❌ None
Human Verification Gates	✅ Built-in workflow	❌ Custom build	❌ Custom build	❌ Optional	❌ None
Task DAG & Dependencies	✅ Native with cycle detection	⚠️ LangGraph only	⚠️ Basic	❌ Linear	❌ None
Multi-Agent Orchestration	✅ Swarm mode with permissions	⚠️ Complex config	✅ Simple teams	⚠️ Unreliable	❌ Single agent
Real-Time Observability	✅ Pixel workspace + Kanban	❌ External tools	❌ External tools	❌ Logs only	❌ None
Embedded Deployment	✅ PGlite, 2 commands	❌ External DB	❌ External DB	❌ Complex setup	❌ Manual
Production Permissions	✅ 5×3 grid + presets	❌ Application-level	❌ Role-based	❌ None	❌ Manual
AI-DLC Compliance	✅ Native methodology	❌ None	❌ None	❌ None	❌ None

The verdict: LangChain and CrewAI help you construct agents; Chorus helps you operate them reliably. AutoGPT demonstrated why autonomous agents without harnesses fail in production. Raw MCP is powerful but leaves every operational concern as your responsibility. Chorus fills the infrastructure gap that these tools deliberately avoid.

FAQ: Critical Questions from Production Teams

Is Chorus a replacement for LangChain or CrewAI?

No—it's complementary. Build agents with your preferred framework; Chorus provides the operational harness for session management, state persistence, and human collaboration. Many teams use LangChain for agent logic and Chorus for orchestration infrastructure.

How does session recovery work when an agent crashes?

Chorus maintains persistent session records with heartbeat timestamps. When heartbeats cease, the session enters disconnected state—task progress is preserved, not lost. On agent reconnection (same or different process), Chorus resumes from the last checkpoint with full context injection via the Chorus Plugin lifecycle hooks.

Can I run Chorus without any external dependencies?

Absolutely. The npm install -g @chorus-aidlc/chorus && chorus path requires nothing but Node.js. PGlite provides embedded PostgreSQL; no Docker, no cloud services, no configuration files. This is intentional for developer experience and air-gapped environments.

What happens when PGlite hits concurrent connection limits?

PGlite's single-process architecture handles ~5-10 concurrent connections gracefully. Beyond this, you'll see connection queueing or timeouts. The fix is seamless: set DATABASE_URL=postgresql://... pointing to any standard PostgreSQL 16+ server—no application changes, no data migration complexity.

How do I connect Claude Code or other agents?

Use the in-app setup wizard at Settings → Setup Guide for one-click configuration, or follow the per-client guides: Claude Code, Codex CLI, OpenCode, or generic MCP agents. The wizard generates API keys (cho_... prefix) and provides exact connection commands.

Is Chorus production-ready for regulated industries?

The AGPL-3.0 license requires source disclosure for network use—evaluate with legal counsel. Technically, Chorus provides audit trails, permission gating, and human verification that satisfy many compliance frameworks. The AWS CDK deployment includes VPC isolation, encrypted Aurora storage, and ALB HTTPS termination.

What's the difference between `/yolo` and standard workflow?

/yolo (v0.6.1+) automates the full AI-DLC pipeline with parallel agent team execution—suitable for well-understood task types with lower risk tolerance. Standard workflow enforces explicit human gates at Proposal and Verify stages. Use /yolo for rapid prototyping; standard workflow for production changes.

Conclusion: The Infrastructure Layer AI-Native Teams Need

Chorus solves the problem everyone building with AI agents eventually hits: agents are easy to demo, impossible to operate. The gap between a promising prototype and a reliable production system isn't smarter prompts or larger models—it's infrastructure for session management, state recovery, human oversight, and multi-agent coordination.

The Reversed Conversation philosophy isn't conservative—it's pragmatic. AI agents propose at machine speed; humans verify at comprehension speed. This rhythm prevents the catastrophic errors that create organizational resistance to AI adoption, while still capturing dramatic productivity gains.

With two-command deployment, embedded PostgreSQL for zero-config starts, fine-grained permissions for multi-tenant safety, and production paths to Docker, AWS, and external databases, Chorus scales from individual experimentation to enterprise deployment without architecture changes.

The agent harness category is emerging as the critical infrastructure layer for AI-native software development. Chorus, with its AI-DLC methodology foundation and active open-source development, is positioned as the reference implementation.

Stop losing agent context. Stop rebuilding from crashes. Start orchestrating AI-human collaboration with infrastructure designed for production reality.

👉 Get Chorus on GitHub — star the repo, try the two-command install, and join the Discord community to share your agent orchestration patterns.

Built with Next.js 15, React 19, Prisma 7, PostgreSQL 16, and MCP SDK 1.26. Licensed under AGPL-3.0.