Spec Kitty: The CLI Workflow AI Coding Agents Desperately Need
Spec Kitty: The CLI Workflow AI Coding Agents Desperately Need
Your AI coding agent just forgot the requirements again, didn't it?
You've been there. Thirty minutes into a Claude or Cursor session, and suddenly the agent is rewriting code you already approved. Or worse—it's building features you never asked for while ignoring the critical acceptance criteria buried somewhere in chat history. The context window choked. Your requirements evaporated. And now you're manually stitching together fragments of intent from a scrolling graveyard of forgotten prompts.
Here's the brutal truth: AI coding agents are incredibly powerful and fundamentally chaotic without structure. They don't forget on purpose, but they absolutely will forget. Every. Single. Time. Without a system.
That's exactly why developers are quietly abandoning raw chat interfaces and flocking to a new open-source CLI that's turning product intent into repeatable, reviewable, mergeable agent workflows. It's called Spec Kitty—and if you're serious about shipping production software with AI assistance, you need to understand what it does differently.
What Is Spec Kitty?
Spec Kitty is an open-source CLI tool for spec-driven development with AI coding agents, created by Priivacy AI and available on GitHub at github.com/Priivacy-ai/spec-kitty. Built in Python 3.11+, it transforms how developers collaborate with Claude, Cursor, Gemini, Codex, and a growing ecosystem of AI coding tools.
The core philosophy is elegantly simple: your repository becomes the source of truth for everything. Not a chat window. Not a cloud service you don't control. Your actual Git repository stores specs, plans, tasks, reviews, and merge state—making AI collaboration traceable, reproducible, and team-friendly.
Spec Kitty is trending right now because it solves the exact pain point that exploded in 2024-2025: developers adopted AI coding agents faster than they adopted workflows to manage them. Everyone got excited about vibe coding, then everyone hit the same wall—unstructured agent sessions produce unstructured, unreviewable, unmergeable code.
The tool enforces a disciplined lifecycle:
spec → plan → tasks → next → review → accept → merge
This isn't bureaucracy for bureaucracy's sake. It's the minimum viable process that prevents agents from spiraling into chaos. And it does this while staying local-first—no mandatory cloud dependencies, no vendor lock-in, no surprise subscription fees.
Key Features That Separate Spec Kitty from the Chaos
Repository-Native Mission Artifacts
Every spec, plan, and task lives under kitty-specs/ in your repo. This means:
- Git history tracks your decisions—no more "what did we agree on in hour three?"
- Code reviews include the spec—reviewers see intent, not just implementation
- CI/CD can validate against acceptance criteria—automated gates become possible
Git Worktrees for Isolated Implementation
Spec Kitty creates isolated git worktrees under .worktrees/. Here's why this matters technically:
- No branch-switching overhead—agents work in parallel workspaces without touching your main working tree
- Clean context boundaries—each mission gets its own filesystem, preventing cross-contamination
- Easy comparison and rollback—review changes in isolation before any merge risk
Kanban Dashboard with spec-kitty dashboard
A local web dashboard visualizes mission progress across lifecycle lanes: planned, in_progress, for_review, and done. For teams juggling multiple agent missions, this replaces the spreadsheet hacks and mental juggling acts.
Multi-Agent Orchestration
Spec Kitty supports slash commands for Claude, Cursor, Gemini, Codex, Copilot, OpenCode, Qwen, Windsurf, Kiro, Vibe, Pi, Letta, and more. The --agent flag lets you route work to different agents based on their strengths—Claude for architecture, Cursor for implementation, Gemini for testing, for example.
Auto-Merge with --push
Once review and acceptance are complete, spec-kitty merge --push handles the final integration. The entire lifecycle becomes scriptable and automatable.
Real-World Use Cases Where Spec Kitty Shines
Use Case 1: The Vanishing Requirements Problem
Scenario: You're building a payment integration. The agent implemented Stripe, but forgot the PCI compliance requirements from your original prompt. Three hours of refactoring later, you discover the audit trail requirement was never implemented.
Spec Kitty fix: /spec-kitty.specify creates a persistent spec document. The agent references it throughout. Acceptance criteria live in kitty-specs/, not chat history.
Use Case 2: The Multi-Developer Agent Team
Scenario: Three developers on your team all use Cursor, but each starts from different assumptions. One agent refactors the API while another builds features assuming the old API. Merge conflicts become architectural conflicts.
Spec Kitty fix: Shared kitty-specs/ in the repo means all agents read the same plan. Work packages have clear boundaries. The spec-kitty next command ensures agents don't step on each other.
Use Case 3: The "It Worked in Chat" Deployment Disaster
Scenario: Your agent generated beautiful code that passed all its self-tests. You copy-pasted it into the repo. Production broke because the agent never saw your actual environment variables, database schema, or existing middleware.
Spec Kitty fix: Worktrees use your actual repo context. Agents work with real files, real configs, real dependencies. The verify-setup command catches environment mismatches before work begins.
Use Case 4: The Review Nightmare
Scenario: You need to review 500 lines of agent-generated code. No comments explain why choices were made. The commit message says "update." Your security team rejects it outright.
Spec Kitty fix: The spec → plan → tasks chain creates natural documentation. Reviewers see the original intent, the planned approach, and the implementation. spec-kitty.review structures the review process with explicit acceptance.
Step-by-Step Installation & Setup Guide
Prerequisites
- Python 3.11 or newer
- Git (obviously—you're using Git, right?)
- A working AI coding agent (Claude, Cursor, Gemini, etc.)
Install the CLI
Recommended method with pipx (isolates dependencies, avoids system conflicts):
# Install pipx if you don't have it
pip install pipx
pipx ensurepath
# Install Spec Kitty
pipx install spec-kitty-cli
Alternative with uv (blazing fast, modern Python tooling):
uv tool install spec-kitty-cli
Fallback with pip (only inside activated virtual environments):
python -m pip install spec-kitty-cli
⚠️ Why pipx over pip? Modern Linux distributions mark system Python as "externally managed." pipx avoids the
externally-managed-environmenterrors that plague barepip installattempts.
Initialize Your First Project
# Create a new project with your preferred agent
spec-kitty init my-project --ai claude
# Or add to existing repository
cd existing-repo
spec-kitty init . --ai cursor
Verify Everything Works
cd my-project
spec-kitty verify-setup
This checks your installation, project wiring, and agent configuration. Fix any red flags before proceeding.
Configure Your Environment (Optional but Recommended)
For template development or custom setups:
# Point to custom templates during development
export SPEC_KITTY_TEMPLATE_ROOT="$(pwd)"
spec-kitty init my-project --ai claude
REAL Code Examples from the Repository
Let's walk through Spec Kitty's actual workflow using the exact commands from the README, with detailed explanations of what happens under the hood.
Example 1: The Core Workflow Initialization
/spec-kitty.charter
/spec-kitty.specify Build a small task list app.
/spec-kitty.plan
/spec-kitty.tasks
What's happening here:
These are slash commands interpreted by your AI coding agent (not shell commands). They trigger Spec Kitty's structured workflow:
-
/spec-kitty.charter— Establishes the mission context. The agent reads any existing project charter and understands the high-level constraints, tech stack, and conventions. -
/spec-kitty.specify "Build a small task list app."— This is the critical step most developers skip. The agent creates a formal specification document underkitty-specs/<mission-slug>/spec.md. This spec includes:- Functional requirements (what the app does)
- Non-functional requirements (performance, security, accessibility)
- Acceptance criteria (how you know it's done)
- Explicit exclusions (what's out of scope—prevents scope creep!)
-
/spec-kitty.plan— The agent decomposes the spec into a technical plan: architecture decisions, file structure, dependency choices, and implementation order. This lives inkitty-specs/<mission-slug>/plan.md. -
/spec-kitty.tasks— The plan becomes concrete tasks with lifecycle states. Each task gets a unique identifier, estimated complexity, and dependency graph. Tasks live inkitty-specs/<mission-slug>/tasks/.
Why this matters: Without these steps, your agent is improvising. With them, every subsequent action has a documented rationale you can review, challenge, and improve.
Example 2: The Runtime Execution Loop
spec-kitty next --agent claude --mission <mission-slug>
What's happening here:
This is where Spec Kitty's intelligence shines. Unlike raw agent chat where you manually prompt each step, next queries the mission state and determines the optimal next action:
# The CLI examines:
# 1. Current task states (planned → in_progress → for_review → done)
# 2. Git worktree status (clean? conflicts?)
# 3. Agent capabilities (what can Claude do best?)
# 4. Dependencies (are prerequisite tasks complete?)
spec-kitty next --agent claude --mission task-list-app-v1
Possible outputs the CLI might return:
Next action: Implement task creation endpoint
Task: TASK-003-create-endpoint
Worktree: .worktrees/task-list-app-v1-TASK-003/
Estimated: 15 minutes
Run: spec-kitty execute --task TASK-003
Or if review is pending:
Next action: Review completed task
Task: TASK-002-data-model (status: for_review)
Review file: kitty-specs/task-list-app-v1/reviews/TASK-002.md
Run: spec-kitty review --task TASK-002
The --agent flag is crucial for multi-agent teams. You might run:
# Claude handles architecture and complex logic
spec-kitty next --agent claude --mission api-redesign
# Cursor handles UI implementation
spec-kitty next --agent cursor --mission dashboard-v2
# Gemini handles test generation
spec-kitty next --agent gemini --mission test-coverage
Example 3: The Acceptance and Merge Pipeline
/spec-kitty.review
/spec-kitty.accept
/spec-kitty.merge --push
What's happening here:
This three-stage gate prevents the "agent said it's done so I guess it's done" anti-pattern:
-
/spec-kitty.review— Structured review against acceptance criteria. The agent (or human!) checks each criterion from the original spec. Results are recorded inkitty-specs/<mission-slug>/reviews/. -
/spec-kitty.accept— Formal sign-off. This creates an immutable acceptance record with timestamp, reviewer identity, and any noted exceptions. Think of it as a lightweight audit trail. -
/spec-kitty.merge --push— Atomic integration. The worktree changes merge into the main branch, and with--push, deploy to origin. No manual copy-paste. No "did I get all the files?" anxiety.
Behind the scenes, this executes approximately:
# Spec Kitty handles the git orchestration
git -C .worktrees/<mission-worktree>/ diff --name-only # What changed?
git merge-tree $(git merge-base HEAD <worktree-branch>) HEAD <worktree-branch> # Clean merge?
git merge --no-ff <worktree-branch> -m "feat: <mission-name> [accepted by <agent>]"
git push origin HEAD # If --push specified
Example 4: Development Setup from Source
git clone https://github.com/Priivacy-ai/spec-kitty.git
cd spec-kitty
pip install -e ".[test]"
What's happening here:
This installs Spec Kitty in editable mode with test dependencies. The -e flag creates a symlink rather than copying files, so your code changes reflect immediately without reinstallation.
# The .[test] extras install:
# - pytest and plugins
# - coverage tools
# - linting (likely ruff, mypy)
# - any test fixtures or mocks
# Verify your development environment
pytest # Run full test suite
spec-kitty --help # Confirm CLI is accessible
For template hacking, the environment variable override is essential:
export SPEC_KITTY_TEMPLATE_ROOT="$(pwd)"
spec-kitty init my-project --ai claude
# Now uses YOUR modified templates, not the packaged defaults
Advanced Usage & Best Practices
Pro Tip 1: Mission Granularity
Don't create monolithic missions. A good mission scope is 2-4 hours of focused work. Larger missions cause context window pressure; smaller ones create overhead. The spec-kitty.plan output should feel ambitious but achievable.
Pro Tip 2: Agent Specialization Matrix
Build a team-of-agents mental model:
| Task Type | Recommended Agent | Why |
|---|---|---|
| Architecture & refactoring | Claude | Superior reasoning, longer context |
| Feature implementation | Cursor | Excellent code generation, IDE integration |
| Test generation | Gemini | Fast, thorough edge-case coverage |
| Documentation | Any with /spec-kitty.specify |
Structured output matters more than creativity |
| Security review | Dedicated security agent | Fresh perspective, adversarial thinking |
Pro Tip 3: Dashboard-Driven Standups
Run spec-kitty dashboard before team syncs. The kanban view exposes blockers instantly:
# Start dashboard (opens browser)
spec-kitty dashboard
# Or get quick terminal summary
spec-kitty status --all-missions
Pro Tip 4: Upgrade Path Discipline
When Spec Kitty releases updates:
pipx upgrade spec-kitty-cli # Update CLI
spec-kitty upgrade # Migrate existing projects
spec-kitty verify-setup # Confirm everything works
The upgrade command handles template migrations and config format changes—don't skip it.
Pro Tip 5: External Orchestrator Integration
For teams with existing CI/CD or project management tools, Spec Kitty supports external orchestrators. See the External Orchestrator Runbook for wiring into Jenkins, GitHub Actions, Linear, or Jira.
Comparison with Alternatives
| Feature | Spec Kitty | Raw Chat (Claude/Cursor) | GitHub Copilot Workspace | LangChain/LangGraph |
|---|---|---|---|---|
| Spec persistence | ✅ Repository-native | ❌ Chat history only | ⚠️ PR descriptions | ✅ Custom implementation |
| Git worktrees | ✅ Built-in | ❌ Manual | ❌ Branch-based | ❌ Manual |
| Local-first | ✅ No cloud required | ✅ | ⚠️ GitHub-dependent | ✅ |
| Multi-agent support | ✅ 12+ agents | ❌ Single session | ⚠️ Copilot only | ✅ With custom code |
| Kanban dashboard | ✅ Local | ❌ | ❌ | ❌ Build yourself |
| Structured review/accept | ✅ Formal gates | ❌ Informal | ⚠️ PR reviews | ✅ Custom implementation |
| Setup complexity | ⚠️ CLI install | ✅ Zero | ✅ GitHub native | ❌ Significant |
| Open source | ✅ MIT | N/A | ❌ | ✅ Various |
When to choose Spec Kitty over alternatives:
- vs. Raw Chat: You ship production code, not prototypes. You need audit trails and team coordination.
- vs. Copilot Workspace: You want vendor independence, local control, and structured lifecycle management.
- vs. LangChain: You need an opinionated workflow today, not a framework to build one over weeks.
FAQ
Is Spec Kitty free?
Yes. Spec Kitty is MIT-licensed open source. You pay nothing for the CLI. You still need API access to your chosen AI agents (Claude Pro, Cursor subscription, etc.).
Does it work with my existing codebase?
Absolutely. Run spec-kitty init . --ai <agent> in any Git repository. It adds kitty-specs/ and .worktrees/ to your existing structure without disrupting current workflows.
What if my team doesn't use AI agents yet?
Spec Kitty works for human-driven spec-driven development too. The structured workflow benefits any team writing specs before code. Add agents when you're ready.
How does this compare to Jira or Linear?
Jira/Linear track work; Spec Kitty orchestrates execution. They're complementary—Spec Kitty can sync to external trackers (see Hosted Sync Workspaces), but its unique value is the agent-facing workflow and git-native storage.
Can I use multiple agents on one mission?
Yes, strategically. Use --agent to route specific tasks. However, each task should have one responsible agent to maintain coherence. The multi-agent orchestration docs cover patterns for handoffs.
What happens if spec-kitty next chooses wrong?
You're always in control. next is a recommendation, not a command. Override with explicit spec-kitty execute --task <specific-task> or adjust task priorities in the dashboard.
Is my code sent to Priivacy AI's servers?
No. Spec Kitty is local-first. Your specs, plans, and code stay in your repository. Only optional hosted sync features (if you enable them) would involve external services.
Conclusion: The Missing Piece in Your AI Coding Stack
Here's what I believe after digging deep into Spec Kitty: We've been using AI coding agents like they're magic typewriters, when they're actually junior developers who need management. The agents aren't the problem. The workflow is.
Spec Kitty gives you that workflow without the enterprise baggage. It's lightweight enough for solo developers, structured enough for teams, and principled enough for production software. The spec → plan → tasks → next → review → accept → merge lifecycle isn't bureaucracy—it's the minimum viable process that prevents agent chaos.
The repository-native approach is genuinely clever. By keeping mission artifacts in kitty-specs/, Spec Kitty makes AI collaboration as reviewable and version-controlled as the code itself. The git worktrees solve a real technical problem that every multi-tasking developer has felt. And the multi-agent support acknowledges what we're all discovering: no single agent is best at everything.
If you're currently copy-pasting from chat windows, losing requirements in scrolling history, or manually managing agent-generated branches, you're working harder than necessary. Get Spec Kitty from GitHub, run pipx install spec-kitty-cli, and try the Your First Feature tutorial.
Your future self—reviewing clean, spec-backed, properly merged code—will thank you.
Star the repo, open an issue if you hit snags, and welcome to spec-driven agent development. 🐱
Comments (0)
No comments yet. Be the first to share your thoughts!