Stop Letting AI Write Bloated Code: Karpathy's Rules for Claude
Stop Letting AI Write Bloated Code: Karpathy's Rules for Claude
You paste a simple request into Claude Code. "Add email validation to this form." Thirty seconds later, you're staring at 400 lines of new code: abstract base classes, plugin architectures, configuration registries, and error handling for scenarios that will literally never happen. Your three-field form now imports twelve new dependencies. And somewhere in the chaos, the original bug is still there—buried under layers of "flexible" architecture that nobody asked for.
Sound familiar? You're not alone. Andrej Karpathy—yes, the Andrej Karpathy, former Tesla AI Director and OpenAI founding member—publicly called out this exact failure mode. LLMs, he observed, don't manage confusion, don't surface tradeoffs, and absolutely love to overcomplicate everything they touch. They'll happily burn 1,000 lines on what should be 100, change code they don't understand as "side effects," and silently make wrong assumptions rather than ask clarifying questions.
The result? Code reviews from hell. Subtle bugs introduced by "helpful" orthogonal edits. Technical debt accumulated at machine speed. And you, the developer, left cleaning up an AI's creative explosion.
But what if you could flip the script? What if a single file—literally one CLAUDE.md—could reprogram Claude Code to write like a disciplined senior engineer instead of an overeager intern?
That's exactly what forrestchang/andrej-karpathy-skills delivers. Derived directly from Karpathy's observations on LLM coding pitfalls, this repository contains battle-tested principles that transform Claude Code from a bloated-code generator into a precision surgical instrument. And the best part? Installation takes under a minute.
What Is the Karpathy-Inspired Claude Code Guidelines?
The andrej-karpathy-skills repository is a minimalist yet devastatingly effective project: a single CLAUDE.md file that encodes four behavioral principles into Claude Code's system prompt. Created by developer Forrest Chang (follow him on X at @jiayuan_jy), it translates Andrej Karpathy's viral thread on LLM coding failures into actionable constraints that reshape how Claude approaches every task.
Karpathy's original observations weren't theoretical complaints—they were field notes from someone who has watched LLMs fail at scale. He noted that models "make wrong assumptions on your behalf and just run along with them without checking." They "don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." Most damningly: "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do."
These aren't edge cases. They're systematic behavioral patterns baked into how LLMs are trained to be "helpful." The model's reward function optimizes for completeness and proactivity—which, in coding contexts, translates to overengineering and unwanted "improvements."
The andrej-karpathy-skills repository interrupts this failure mode at the prompt level. Rather than hoping Claude happens to write clean code, it mandates specific behaviors through structured principles. Think of it as a coding standards document that the AI actually follows—because it's embedded in its operating context.
The repository has gained rapid traction precisely because it solves a universal pain point. Every developer using AI coding assistants has experienced the "1,000-line solution to a 100-line problem" phenomenon. This project offers a concrete, tested remedy—not another vague "prompt engineering" blog post, but a drop-in file with immediate behavioral changes.
The Four Principles That Fix Everything
The guidelines organize around four core principles, each directly mapped to Karpathy's identified failure modes. Let's dissect what makes them effective.
1. Think Before Coding
The Problem: LLMs silently pick interpretations and execute, never surfacing uncertainty.
The Fix: Explicit reasoning requirements. The guidelines force Claude to state assumptions explicitly, present multiple interpretations when ambiguity exists, push back when simpler approaches are available, and—critically—stop and ask when confused rather than guessing.
This transforms Claude from a silent executor into a collaborative thinker. Instead of discovering after 200 lines that it misunderstood your intent, you get clarifying questions before implementation begins. The principle directly attacks Karpathy's observation that models "don't manage their confusion, don't seek clarifications."
2. Simplicity First
The Problem: LLMs default to overengineering—abstractions for single-use code, speculative flexibility, error handling for impossible scenarios.
The Fix: Brutal minimalism. No features beyond what was asked. No abstractions unless genuinely reused. No "configurability" that wasn't requested. The litmus test: "Would a senior engineer say this is overcomplicated? If yes, simplify."
This principle is where most AI-generated code fails hardest. The training data rewards comprehensive solutions; this guideline explicitly penalizes comprehensiveness beyond scope. It's counterintuitive to how LLMs are optimized, which is exactly why it works.
3. Surgical Changes
The Problem: LLMs "improve" adjacent code, refactor working systems, change comments they don't understand, and introduce side effects orthogonal to the actual task.
The Fix: Strict scope isolation. Touch only what you must. Match existing style even if you'd prefer different patterns. Notice unrelated dead code? Mention it—don't delete it. Remove only what your changes made unused, never pre-existing dead code unless explicitly asked.
Every changed line must trace directly to the user's request. This single principle eliminates the dreaded "drive-by refactoring" that makes code reviews nightmares and introduces subtle regressions.
4. Goal-Driven Execution
The Problem: Vague instructions produce vague results. "Make it work" leads to iterative clarification cycles that waste time.
The Fix: Transform imperative tasks into declarative, verifiable goals. "Add validation" becomes "Write tests for invalid inputs, then make them pass." "Fix the bug" becomes "Write a test that reproduces it, then make it pass." Multi-step tasks get explicit verification checkpoints.
This captures Karpathy's key insight: "LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go."
Real-World Scenarios Where These Guidelines Shine
Scenario 1: The Legacy Codebase Touch
You're adding a feature to a five-year-old Python monolith. Without guidelines, Claude sees the old-style string formatting and "helpfully" modernizes every file it touches—f-strings everywhere, type hints retrofitted, imports restructured. Your PR now has 47 changed files. Review takes three days. A subtle import cycle breaks production.
With Karpathy guidelines: Claude adds your feature using existing patterns. The diff shows 3 files, 40 lines. Review takes 20 minutes. Nothing breaks.
Scenario 2: The Startup MVP Sprint
You need user authentication by Friday. Claude generates a JWT-based auth system with refresh token rotation, OAuth2 integration hooks, role-based access control, and audit logging. It's beautiful. It's also 2,000 lines and will take two weeks to security-review.
With Karpathy guidelines: Claude implements session-based auth with bcrypt hashing—300 lines, production-ready for your stage, extensible when needed. You ship Friday.
Scenario 3: The Refactoring Trap
"Refactor this 200-line function" becomes a complete module rearchitecture with strategy patterns, dependency injection, and event-driven side effects. The original function is gone. So are three working features that depended on its specific behavior.
With Karpathy guidelines: Claude extracts two helper functions, renames variables for clarity, adds a test verifying pre/post behavior parity. The function is now 80 lines. Everything still works.
Scenario 4: The Confusing Requirements
Your product manager says "make the form better." Claude assumes "better" means real-time validation, auto-save, progressive enhancement, and accessibility overhaul. Three days later, you learn they meant "add a character counter."
With Karpathy guidelines: Claude stops. "'Better' is ambiguous. Possible interpretations: (1) validation improvements, (2) UX enhancements, (3) accessibility compliance. Which aspects are priorities?" You clarify in 30 seconds. Claude implements exactly what's needed.
Installation & Setup: Get Protected in Under 60 Seconds
The repository offers two installation paths depending on your workflow.
Option A: Claude Code Plugin (Recommended)
This makes the guidelines available across all projects automatically.
First, add the marketplace:
/plugin marketplace add forrestchang/andrej-karpathy-skills
Then install the plugin:
/plugin install andrej-karpathy-skills@karpathy-skills
That's it. Every Claude Code session now inherits these behavioral constraints.
Option B: Per-Project CLAUDE.md
For project-specific application or when you can't use plugins.
New project:
# Download the guidelines directly into your project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
Existing project (append to existing CLAUDE.md):
# Add separator for clarity
echo "" >> CLAUDE.md
# Append Karpathy guidelines
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md
Cursor Users: You're Covered Too
The repository includes a committed Cursor project rule at .cursor/rules/karpathy-guidelines.mdc. Open the project in Cursor and the same principles apply automatically. See CURSOR.md for using the rule in other projects.
Real Code Examples: See the Guidelines in Action
Let's examine how these principles transform actual interactions. The repository's structure demonstrates the simplicity-first ethos—minimal files, maximum impact.
Example 1: The Goal-Driven Execution Pattern
Here's how multi-step tasks should be structured with verification loops:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
This isn't pseudocode—it's the literal format the guidelines recommend for complex tasks. Let's see it applied:
Without guidelines:
# Claude receives: "Add user authentication"
# Result: 2,000-line auth system with speculative features
class AuthenticationManager:
def __init__(self, strategy_factory, token_rotator, audit_logger):
self.strategy_factory = strategy_factory # Never used in MVP
self.token_rotator = token_rotator # Over-engineered
self.audit_logger = audit_logger # Not requested
self.oauth_providers = {} # Speculative
def authenticate(self, credentials, context=None, options=None):
# 150 lines of abstraction over simple password check
...
With guidelines applied:
# Claude receives: "Add user authentication"
# Claude's internal plan:
# 1. Write test for valid login → verify: test passes
# 2. Write test for invalid password → verify: test passes
# 3. Implement bcrypt password verification → verify: tests pass
# 4. Add session cookie → verify: integration test passes
import bcrypt
from flask import session
# Only what's needed. No speculation.
def verify_password(stored_hash: bytes, password: str) -> bool:
"""Constant-time password verification."""
return bcrypt.checkpw(password.encode(), stored_hash)
def login_user(username: str, password: str, users_db: dict) -> bool:
"""Authenticate and establish session."""
user = users_db.get(username)
if not user or not verify_password(user['password_hash'], password):
return False
# Minimal viable session
session['user_id'] = user['id']
return True
The difference? 400 lines versus 40. Every line traces to the original request. No speculative abstractions. Verifiable at each step.
Example 2: Project-Specific Customization
The guidelines explicitly encourage merging with your existing standards. Here's the pattern from the repository:
## Project-Specific Guidelines
- Use TypeScript strict mode
- All API endpoints must have tests
- Follow the existing error handling patterns in `src/utils/errors.ts`
This shows the design philosophy: foundational constraints + contextual specifics. The Karpathy principles prevent AI failure modes; your additions encode domain requirements.
Example 3: The Surgical Change Discipline
When editing existing code, the guidelines enforce strict boundaries. Consider this scenario:
# Original file with mixed concerns
import json
import os # ← unused, pre-existing
class DataProcessor:
def process(self, data):
# Your task: add CSV output option
return json.dumps(data)
def legacy_helper(self): # ← dead code, pre-existing
pass
Without guidelines: Claude "notices" the unused import, removes it, deletes legacy_helper, reformats with Black, and converts to dataclass. Your "add CSV output" PR has 40 changed lines.
With guidelines: Claude adds only the CSV method, uses existing style, leaves os import alone, mentions legacy_helper in comments but doesn't touch it.
import json
import os # Preserved despite being unused—wasn't my change
import csv # Only new import, directly for my feature
class DataProcessor:
def process(self, data):
return json.dumps(data)
def process_csv(self, data, output_path):
"""Write data to CSV—added per request."""
with open(output_path, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
def legacy_helper(self): # Untouched—not my change
pass
The diff: 12 lines added, 0 removed. Exactly what was asked.
Advanced Usage & Pro Tips
Layering with Existing CLAUDE.md Files
Don't replace your project's existing context—append these guidelines. The repository specifically recommends this pattern for accumulated project wisdom:
# Preserve existing project context
cat existing-claude.md karpathy-guidelines.md > CLAUDE.md
The Judgment Exception
The guidelines include a critical escape hatch: "For trivial tasks (simple typo fixes, obvious one-liners), use judgment—not every change needs the full rigor." This prevents the guidelines themselves from becoming bureaucracy. The bias is caution over speed for non-trivial work—not slowness for everything.
Verification-Driven Development
The most powerful advanced pattern: always pair guidelines with test commands. Configure your environment so Claude can execute tests autonomously:
# In your CLAUDE.md or project config
## Testing
- Run `pytest` after implementation changes
- Run `pytest --watch` during development
- Coverage minimum: 80% for new code
This lets the "Goal-Driven Execution" principle fully activate—Claude writes tests, implements, verifies, loops.
Cursor-Specific: Cross-Project Rules
The .cursor/rules/karpathy-guidelines.mdc file can be symlinked or copied across projects. Maintain one canonical version, distribute everywhere:
# Global Cursor rules directory
mkdir -p ~/.cursor/rules
cp karpathy-guidelines.mdc ~/.cursor/rules/
How It Compares: Why Not Just Prompt Better?
| Approach | Effectiveness | Maintenance | Portability | Behavioral Depth |
|---|---|---|---|---|
| Manual prompting | Low—forgets between sessions | High—repeat every time | None | Surface-level |
| Custom GPT/Claude instructions | Medium—project-agnostic | Medium—one-time setup | Limited to platform | Medium |
| Karpathy CLAUDE.md | High—embedded in context | Low—single file | Universal—any project | Deep—structural constraints |
| IDE linting rules | Medium—catches symptoms | High—complex configs | Varies by tool | Shallow—syntax only |
| Code review enforcement | High—human judgment | Very high—time intensive | Universal | Deep—but reactive |
The key differentiator: proactive behavioral shaping versus reactive correction. Linting catches overengineering after it's written. These guidelines prevent it from being written. Manual prompting relies on your memory and consistency; this file is always present, always enforced.
Compared to platform-specific custom instructions, the CLAUDE.md approach is tool-agnostic—works with Claude Code, Cursor, or any future tool reading project context files.
Frequently Asked Questions
Will this slow down Claude Code for simple tasks?
No. The guidelines explicitly include a judgment clause for trivial changes. A typo fix doesn't need the full reasoning ritual. The bias is caution for non-trivial work where mistakes are costly.
Can I use this with GitHub Copilot or other AI tools?
The CLAUDE.md format is Claude Code-specific, but the principles translate. For Copilot, you'd need to adapt into comments or docstrings. The Cursor .mdc rule format works in Cursor IDE. The core value is the principles, not the specific file format.
What if my project already has a CLAUDE.md?
Append these guidelines. The repository specifically recommends this—project-specific rules layer on top of foundational behavioral constraints. Use echo "" >> CLAUDE.md and curl-append as shown in the installation section.
Do I need to understand Andrej Karpathy's original thread to use this?
Not at all. The repository distills his observations into actionable rules. Reading the original thread adds context but isn't required for effective use.
How do I know the guidelines are actually working?
Watch for four signals: fewer unnecessary changes in diffs, simpler first-attempt code, clarifying questions before implementation, and clean PRs without drive-by refactoring. The repository provides these exact verification criteria.
Can I modify the guidelines for my team's needs?
Absolutely—MIT licensed. The repository encourages customization, showing exactly how to add project-specific sections. The core four principles are foundational; adapt the specifics to your domain.
Does this work for non-coding tasks with Claude?
The principles generalize—"Think Before Acting," "Simplicity First," "Surgical Changes," and "Goal-Driven Execution" apply broadly. However, the specific formulations are optimized for software engineering contexts.
The Bottom Line
Andrej Karpathy didn't just identify why LLMs write bad code—he described a fundamental mismatch between how models are trained and how software should be built. The andrej-karpathy-skills repository bridges that gap with elegant minimalism: one file, four principles, immediate behavioral transformation.
You have two choices. Keep accepting AI-generated code that silently assumes, overcomplicates, and expands scope without permission. Or install these guidelines once, spend 30 seconds on setup, and never again discover that "simple fix" became a 1,000-line architecture expedition.
The repository is waiting at github.com/forrestchang/andrej-karpathy-skills. The plugin install takes two commands. The per-project setup takes one curl.
Your future self—reviewing clean, minimal diffs that do exactly what was asked—will thank you.
Install it now. Stop the bloat before your next prompt.
Tags
Comments (0)
No comments yet. Be the first to share your thoughts!