Spec Kitty: The CLI Workflow AI Coding Agents Desperately Need

B
Bright Coding
Author
Share:
Spec Kitty: The CLI Workflow AI Coding Agents Desperately Need
Advertisement

Spec Kitty: The CLI Workflow AI Coding Agents Desperately Need

Your AI coding agent just forgot the requirements again, didn't it?

You've been there. Thirty minutes into a Claude or Cursor session, and suddenly the agent is rewriting code you already approved. Or worse—it's building features you never asked for while ignoring the critical acceptance criteria buried somewhere in chat history. The context window choked. Your requirements evaporated. And now you're manually stitching together fragments of intent from a scrolling graveyard of forgotten prompts.

Here's the brutal truth: AI coding agents are incredibly powerful and fundamentally chaotic without structure. They don't forget on purpose, but they absolutely will forget. Every. Single. Time. Without a system.

That's exactly why developers are quietly abandoning raw chat interfaces and flocking to a new open-source CLI that's turning product intent into repeatable, reviewable, mergeable agent workflows. It's called Spec Kitty—and if you're serious about shipping production software with AI assistance, you need to understand what it does differently.


What Is Spec Kitty?

Spec Kitty is an open-source CLI tool for spec-driven development with AI coding agents, created by Priivacy AI and available on GitHub at github.com/Priivacy-ai/spec-kitty. Built in Python 3.11+, it transforms how developers collaborate with Claude, Cursor, Gemini, Codex, and a growing ecosystem of AI coding tools.

The core philosophy is elegantly simple: your repository becomes the source of truth for everything. Not a chat window. Not a cloud service you don't control. Your actual Git repository stores specs, plans, tasks, reviews, and merge state—making AI collaboration traceable, reproducible, and team-friendly.

Spec Kitty is trending right now because it solves the exact pain point that exploded in 2024-2025: developers adopted AI coding agents faster than they adopted workflows to manage them. Everyone got excited about vibe coding, then everyone hit the same wall—unstructured agent sessions produce unstructured, unreviewable, unmergeable code.

The tool enforces a disciplined lifecycle:

spec → plan → tasks → next → review → accept → merge

This isn't bureaucracy for bureaucracy's sake. It's the minimum viable process that prevents agents from spiraling into chaos. And it does this while staying local-first—no mandatory cloud dependencies, no vendor lock-in, no surprise subscription fees.


Key Features That Separate Spec Kitty from the Chaos

Repository-Native Mission Artifacts

Every spec, plan, and task lives under kitty-specs/ in your repo. This means:

  • Git history tracks your decisions—no more "what did we agree on in hour three?"
  • Code reviews include the spec—reviewers see intent, not just implementation
  • CI/CD can validate against acceptance criteria—automated gates become possible

Git Worktrees for Isolated Implementation

Spec Kitty creates isolated git worktrees under .worktrees/. Here's why this matters technically:

  • No branch-switching overhead—agents work in parallel workspaces without touching your main working tree
  • Clean context boundaries—each mission gets its own filesystem, preventing cross-contamination
  • Easy comparison and rollback—review changes in isolation before any merge risk

Kanban Dashboard with spec-kitty dashboard

A local web dashboard visualizes mission progress across lifecycle lanes: planned, in_progress, for_review, and done. For teams juggling multiple agent missions, this replaces the spreadsheet hacks and mental juggling acts.

Multi-Agent Orchestration

Spec Kitty supports slash commands for Claude, Cursor, Gemini, Codex, Copilot, OpenCode, Qwen, Windsurf, Kiro, Vibe, Pi, Letta, and more. The --agent flag lets you route work to different agents based on their strengths—Claude for architecture, Cursor for implementation, Gemini for testing, for example.

Auto-Merge with --push

Once review and acceptance are complete, spec-kitty merge --push handles the final integration. The entire lifecycle becomes scriptable and automatable.


Real-World Use Cases Where Spec Kitty Shines

Use Case 1: The Vanishing Requirements Problem

Scenario: You're building a payment integration. The agent implemented Stripe, but forgot the PCI compliance requirements from your original prompt. Three hours of refactoring later, you discover the audit trail requirement was never implemented.

Spec Kitty fix: /spec-kitty.specify creates a persistent spec document. The agent references it throughout. Acceptance criteria live in kitty-specs/, not chat history.

Use Case 2: The Multi-Developer Agent Team

Scenario: Three developers on your team all use Cursor, but each starts from different assumptions. One agent refactors the API while another builds features assuming the old API. Merge conflicts become architectural conflicts.

Spec Kitty fix: Shared kitty-specs/ in the repo means all agents read the same plan. Work packages have clear boundaries. The spec-kitty next command ensures agents don't step on each other.

Use Case 3: The "It Worked in Chat" Deployment Disaster

Scenario: Your agent generated beautiful code that passed all its self-tests. You copy-pasted it into the repo. Production broke because the agent never saw your actual environment variables, database schema, or existing middleware.

Spec Kitty fix: Worktrees use your actual repo context. Agents work with real files, real configs, real dependencies. The verify-setup command catches environment mismatches before work begins.

Use Case 4: The Review Nightmare

Scenario: You need to review 500 lines of agent-generated code. No comments explain why choices were made. The commit message says "update." Your security team rejects it outright.

Spec Kitty fix: The spec → plan → tasks chain creates natural documentation. Reviewers see the original intent, the planned approach, and the implementation. spec-kitty.review structures the review process with explicit acceptance.


Step-by-Step Installation & Setup Guide

Prerequisites

  • Python 3.11 or newer
  • Git (obviously—you're using Git, right?)
  • A working AI coding agent (Claude, Cursor, Gemini, etc.)

Install the CLI

Recommended method with pipx (isolates dependencies, avoids system conflicts):

# Install pipx if you don't have it
pip install pipx
pipx ensurepath

# Install Spec Kitty
pipx install spec-kitty-cli

Alternative with uv (blazing fast, modern Python tooling):

uv tool install spec-kitty-cli

Fallback with pip (only inside activated virtual environments):

python -m pip install spec-kitty-cli

⚠️ Why pipx over pip? Modern Linux distributions mark system Python as "externally managed." pipx avoids the externally-managed-environment errors that plague bare pip install attempts.

Initialize Your First Project

# Create a new project with your preferred agent
spec-kitty init my-project --ai claude

# Or add to existing repository
cd existing-repo
spec-kitty init . --ai cursor

Verify Everything Works

cd my-project
spec-kitty verify-setup

This checks your installation, project wiring, and agent configuration. Fix any red flags before proceeding.

Configure Your Environment (Optional but Recommended)

For template development or custom setups:

# Point to custom templates during development
export SPEC_KITTY_TEMPLATE_ROOT="$(pwd)"
spec-kitty init my-project --ai claude

REAL Code Examples from the Repository

Let's walk through Spec Kitty's actual workflow using the exact commands from the README, with detailed explanations of what happens under the hood.

Example 1: The Core Workflow Initialization

/spec-kitty.charter
/spec-kitty.specify Build a small task list app.
/spec-kitty.plan
/spec-kitty.tasks

What's happening here:

These are slash commands interpreted by your AI coding agent (not shell commands). They trigger Spec Kitty's structured workflow:

  • /spec-kitty.charter — Establishes the mission context. The agent reads any existing project charter and understands the high-level constraints, tech stack, and conventions.

  • /spec-kitty.specify "Build a small task list app."This is the critical step most developers skip. The agent creates a formal specification document under kitty-specs/<mission-slug>/spec.md. This spec includes:

    • Functional requirements (what the app does)
    • Non-functional requirements (performance, security, accessibility)
    • Acceptance criteria (how you know it's done)
    • Explicit exclusions (what's out of scope—prevents scope creep!)
  • /spec-kitty.plan — The agent decomposes the spec into a technical plan: architecture decisions, file structure, dependency choices, and implementation order. This lives in kitty-specs/<mission-slug>/plan.md.

  • /spec-kitty.tasks — The plan becomes concrete tasks with lifecycle states. Each task gets a unique identifier, estimated complexity, and dependency graph. Tasks live in kitty-specs/<mission-slug>/tasks/.

Why this matters: Without these steps, your agent is improvising. With them, every subsequent action has a documented rationale you can review, challenge, and improve.

Advertisement

Example 2: The Runtime Execution Loop

spec-kitty next --agent claude --mission <mission-slug>

What's happening here:

This is where Spec Kitty's intelligence shines. Unlike raw agent chat where you manually prompt each step, next queries the mission state and determines the optimal next action:

# The CLI examines:
# 1. Current task states (planned → in_progress → for_review → done)
# 2. Git worktree status (clean? conflicts?)
# 3. Agent capabilities (what can Claude do best?)
# 4. Dependencies (are prerequisite tasks complete?)

spec-kitty next --agent claude --mission task-list-app-v1

Possible outputs the CLI might return:

Next action: Implement task creation endpoint
  Task: TASK-003-create-endpoint
  Worktree: .worktrees/task-list-app-v1-TASK-003/
  Estimated: 15 minutes
  Run: spec-kitty execute --task TASK-003

Or if review is pending:

Next action: Review completed task
  Task: TASK-002-data-model (status: for_review)
  Review file: kitty-specs/task-list-app-v1/reviews/TASK-002.md
  Run: spec-kitty review --task TASK-002

The --agent flag is crucial for multi-agent teams. You might run:

# Claude handles architecture and complex logic
spec-kitty next --agent claude --mission api-redesign

# Cursor handles UI implementation
spec-kitty next --agent cursor --mission dashboard-v2

# Gemini handles test generation
spec-kitty next --agent gemini --mission test-coverage

Example 3: The Acceptance and Merge Pipeline

/spec-kitty.review
/spec-kitty.accept
/spec-kitty.merge --push

What's happening here:

This three-stage gate prevents the "agent said it's done so I guess it's done" anti-pattern:

  • /spec-kitty.review — Structured review against acceptance criteria. The agent (or human!) checks each criterion from the original spec. Results are recorded in kitty-specs/<mission-slug>/reviews/.

  • /spec-kitty.accept — Formal sign-off. This creates an immutable acceptance record with timestamp, reviewer identity, and any noted exceptions. Think of it as a lightweight audit trail.

  • /spec-kitty.merge --pushAtomic integration. The worktree changes merge into the main branch, and with --push, deploy to origin. No manual copy-paste. No "did I get all the files?" anxiety.

Behind the scenes, this executes approximately:

# Spec Kitty handles the git orchestration
git -C .worktrees/<mission-worktree>/ diff --name-only  # What changed?
git merge-tree $(git merge-base HEAD <worktree-branch>) HEAD <worktree-branch>  # Clean merge?
git merge --no-ff <worktree-branch> -m "feat: <mission-name> [accepted by <agent>]"
git push origin HEAD  # If --push specified

Example 4: Development Setup from Source

git clone https://github.com/Priivacy-ai/spec-kitty.git
cd spec-kitty
pip install -e ".[test]"

What's happening here:

This installs Spec Kitty in editable mode with test dependencies. The -e flag creates a symlink rather than copying files, so your code changes reflect immediately without reinstallation.

# The .[test] extras install:
# - pytest and plugins
# - coverage tools
# - linting (likely ruff, mypy)
# - any test fixtures or mocks

# Verify your development environment
pytest  # Run full test suite
spec-kitty --help  # Confirm CLI is accessible

For template hacking, the environment variable override is essential:

export SPEC_KITTY_TEMPLATE_ROOT="$(pwd)"
spec-kitty init my-project --ai claude
# Now uses YOUR modified templates, not the packaged defaults

Advanced Usage & Best Practices

Pro Tip 1: Mission Granularity

Don't create monolithic missions. A good mission scope is 2-4 hours of focused work. Larger missions cause context window pressure; smaller ones create overhead. The spec-kitty.plan output should feel ambitious but achievable.

Pro Tip 2: Agent Specialization Matrix

Build a team-of-agents mental model:

Task Type Recommended Agent Why
Architecture & refactoring Claude Superior reasoning, longer context
Feature implementation Cursor Excellent code generation, IDE integration
Test generation Gemini Fast, thorough edge-case coverage
Documentation Any with /spec-kitty.specify Structured output matters more than creativity
Security review Dedicated security agent Fresh perspective, adversarial thinking

Pro Tip 3: Dashboard-Driven Standups

Run spec-kitty dashboard before team syncs. The kanban view exposes blockers instantly:

# Start dashboard (opens browser)
spec-kitty dashboard

# Or get quick terminal summary
spec-kitty status --all-missions

Pro Tip 4: Upgrade Path Discipline

When Spec Kitty releases updates:

pipx upgrade spec-kitty-cli        # Update CLI
spec-kitty upgrade                  # Migrate existing projects
spec-kitty verify-setup             # Confirm everything works

The upgrade command handles template migrations and config format changes—don't skip it.

Pro Tip 5: External Orchestrator Integration

For teams with existing CI/CD or project management tools, Spec Kitty supports external orchestrators. See the External Orchestrator Runbook for wiring into Jenkins, GitHub Actions, Linear, or Jira.


Comparison with Alternatives

Feature Spec Kitty Raw Chat (Claude/Cursor) GitHub Copilot Workspace LangChain/LangGraph
Spec persistence ✅ Repository-native ❌ Chat history only ⚠️ PR descriptions ✅ Custom implementation
Git worktrees ✅ Built-in ❌ Manual ❌ Branch-based ❌ Manual
Local-first ✅ No cloud required ⚠️ GitHub-dependent
Multi-agent support ✅ 12+ agents ❌ Single session ⚠️ Copilot only ✅ With custom code
Kanban dashboard ✅ Local ❌ Build yourself
Structured review/accept ✅ Formal gates ❌ Informal ⚠️ PR reviews ✅ Custom implementation
Setup complexity ⚠️ CLI install ✅ Zero ✅ GitHub native ❌ Significant
Open source ✅ MIT N/A ✅ Various

When to choose Spec Kitty over alternatives:

  • vs. Raw Chat: You ship production code, not prototypes. You need audit trails and team coordination.
  • vs. Copilot Workspace: You want vendor independence, local control, and structured lifecycle management.
  • vs. LangChain: You need an opinionated workflow today, not a framework to build one over weeks.

FAQ

Is Spec Kitty free?

Yes. Spec Kitty is MIT-licensed open source. You pay nothing for the CLI. You still need API access to your chosen AI agents (Claude Pro, Cursor subscription, etc.).

Does it work with my existing codebase?

Absolutely. Run spec-kitty init . --ai <agent> in any Git repository. It adds kitty-specs/ and .worktrees/ to your existing structure without disrupting current workflows.

What if my team doesn't use AI agents yet?

Spec Kitty works for human-driven spec-driven development too. The structured workflow benefits any team writing specs before code. Add agents when you're ready.

How does this compare to Jira or Linear?

Jira/Linear track work; Spec Kitty orchestrates execution. They're complementary—Spec Kitty can sync to external trackers (see Hosted Sync Workspaces), but its unique value is the agent-facing workflow and git-native storage.

Can I use multiple agents on one mission?

Yes, strategically. Use --agent to route specific tasks. However, each task should have one responsible agent to maintain coherence. The multi-agent orchestration docs cover patterns for handoffs.

What happens if spec-kitty next chooses wrong?

You're always in control. next is a recommendation, not a command. Override with explicit spec-kitty execute --task <specific-task> or adjust task priorities in the dashboard.

Is my code sent to Priivacy AI's servers?

No. Spec Kitty is local-first. Your specs, plans, and code stay in your repository. Only optional hosted sync features (if you enable them) would involve external services.


Conclusion: The Missing Piece in Your AI Coding Stack

Here's what I believe after digging deep into Spec Kitty: We've been using AI coding agents like they're magic typewriters, when they're actually junior developers who need management. The agents aren't the problem. The workflow is.

Spec Kitty gives you that workflow without the enterprise baggage. It's lightweight enough for solo developers, structured enough for teams, and principled enough for production software. The spec → plan → tasks → next → review → accept → merge lifecycle isn't bureaucracy—it's the minimum viable process that prevents agent chaos.

The repository-native approach is genuinely clever. By keeping mission artifacts in kitty-specs/, Spec Kitty makes AI collaboration as reviewable and version-controlled as the code itself. The git worktrees solve a real technical problem that every multi-tasking developer has felt. And the multi-agent support acknowledges what we're all discovering: no single agent is best at everything.

If you're currently copy-pasting from chat windows, losing requirements in scrolling history, or manually managing agent-generated branches, you're working harder than necessary. Get Spec Kitty from GitHub, run pipx install spec-kitty-cli, and try the Your First Feature tutorial.

Your future self—reviewing clean, spec-backed, properly merged code—will thank you.

Star the repo, open an issue if you hit snags, and welcome to spec-driven agent development. 🐱

Advertisement

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Apps & Tools Open Source

Apps & Tools Open Source

Bright Coding Prompt

Bright Coding Prompt

Categories

Advertisement
Advertisement
Advertisement