Stop Letting AI Write Bloated Code: Karpathy's Rules for Claude

You paste a simple request into Claude Code. "Add email validation to this form." Thirty seconds later, you're staring at 400 lines of new code: abstract base classes, plugin architectures, configuration registries, and error handling for scenarios that will literally never happen. Your three-field form now imports twelve new dependencies. And somewhere in the chaos, the original bug is still there—buried under layers of "flexible" architecture that nobody asked for.

Sound familiar? You're not alone. Andrej Karpathy—yes, the Andrej Karpathy, former Tesla AI Director and OpenAI founding member—publicly called out this exact failure mode. LLMs, he observed, don't manage confusion, don't surface tradeoffs, and absolutely love to overcomplicate everything they touch. They'll happily burn 1,000 lines on what should be 100, change code they don't understand as "side effects," and silently make wrong assumptions rather than ask clarifying questions.

The result? Code reviews from hell. Subtle bugs introduced by "helpful" orthogonal edits. Technical debt accumulated at machine speed. And you, the developer, left cleaning up an AI's creative explosion.

But what if you could flip the script? What if a single file—literally one CLAUDE.md—could reprogram Claude Code to write like a disciplined senior engineer instead of an overeager intern?

That's exactly what forrestchang/andrej-karpathy-skills delivers. Derived directly from Karpathy's observations on LLM coding pitfalls, this repository contains battle-tested principles that transform Claude Code from a bloated-code generator into a precision surgical instrument. And the best part? Installation takes under a minute.

What Is the Karpathy-Inspired Claude Code Guidelines?

The andrej-karpathy-skills repository is a minimalist yet devastatingly effective project: a single CLAUDE.md file that encodes four behavioral principles into Claude Code's system prompt. Created by developer Forrest Chang (follow him on X at @jiayuan_jy), it translates Andrej Karpathy's viral thread on LLM coding failures into actionable constraints that reshape how Claude approaches every task.

Karpathy's original observations weren't theoretical complaints—they were field notes from someone who has watched LLMs fail at scale. He noted that models "make wrong assumptions on your behalf and just run along with them without checking." They "don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." Most damningly: "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do."

These aren't edge cases. They're systematic behavioral patterns baked into how LLMs are trained to be "helpful." The model's reward function optimizes for completeness and proactivity—which, in coding contexts, translates to overengineering and unwanted "improvements."

The andrej-karpathy-skills repository interrupts this failure mode at the prompt level. Rather than hoping Claude happens to write clean code, it mandates specific behaviors through structured principles. Think of it as a coding standards document that the AI actually follows—because it's embedded in its operating context.

The repository has gained rapid traction precisely because it solves a universal pain point. Every developer using AI coding assistants has experienced the "1,000-line solution to a 100-line problem" phenomenon. This project offers a concrete, tested remedy—not another vague "prompt engineering" blog post, but a drop-in file with immediate behavioral changes.

The Four Principles That Fix Everything

The guidelines organize around four core principles, each directly mapped to Karpathy's identified failure modes. Let's dissect what makes them effective.

1. Think Before Coding

The Problem: LLMs silently pick interpretations and execute, never surfacing uncertainty.

The Fix: Explicit reasoning requirements. The guidelines force Claude to state assumptions explicitly, present multiple interpretations when ambiguity exists, push back when simpler approaches are available, and—critically—stop and ask when confused rather than guessing.

This transforms Claude from a silent executor into a collaborative thinker. Instead of discovering after 200 lines that it misunderstood your intent, you get clarifying questions before implementation begins. The principle directly attacks Karpathy's observation that models "don't manage their confusion, don't seek clarifications."

2. Simplicity First

The Problem: LLMs default to overengineering—abstractions for single-use code, speculative flexibility, error handling for impossible scenarios.

The Fix: Brutal minimalism. No features beyond what was asked. No abstractions unless genuinely reused. No "configurability" that wasn't requested. The litmus test: "Would a senior engineer say this is overcomplicated? If yes, simplify."

This principle is where most AI-generated code fails hardest. The training data rewards comprehensive solutions; this guideline explicitly penalizes comprehensiveness beyond scope. It's counterintuitive to how LLMs are optimized, which is exactly why it works.

3. Surgical Changes

The Problem: LLMs "improve" adjacent code, refactor working systems, change comments they don't understand, and introduce side effects orthogonal to the actual task.

The Fix: Strict scope isolation. Touch only what you must. Match existing style even if you'd prefer different patterns. Notice unrelated dead code? Mention it—don't delete it. Remove only what your changes made unused, never pre-existing dead code unless explicitly asked.

Every changed line must trace directly to the user's request. This single principle eliminates the dreaded "drive-by refactoring" that makes code reviews nightmares and introduces subtle regressions.

4. Goal-Driven Execution

The Problem: Vague instructions produce vague results. "Make it work" leads to iterative clarification cycles that waste time.

The Fix: Transform imperative tasks into declarative, verifiable goals. "Add validation" becomes "Write tests for invalid inputs, then make them pass." "Fix the bug" becomes "Write a test that reproduces it, then make it pass." Multi-step tasks get explicit verification checkpoints.

This captures Karpathy's key insight: "LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go."

Real-World Scenarios Where These Guidelines Shine

Scenario 1: The Legacy Codebase Touch

You're adding a feature to a five-year-old Python monolith. Without guidelines, Claude sees the old-style string formatting and "helpfully" modernizes every file it touches—f-strings everywhere, type hints retrofitted, imports restructured. Your PR now has 47 changed files. Review takes three days. A subtle import cycle breaks production.

With Karpathy guidelines: Claude adds your feature using existing patterns. The diff shows 3 files, 40 lines. Review takes 20 minutes. Nothing breaks.

Scenario 2: The Startup MVP Sprint

You need user authentication by Friday. Claude generates a JWT-based auth system with refresh token rotation, OAuth2 integration hooks, role-based access control, and audit logging. It's beautiful. It's also 2,000 lines and will take two weeks to security-review.

With Karpathy guidelines: Claude implements session-based auth with bcrypt hashing—300 lines, production-ready for your stage, extensible when needed. You ship Friday.

Scenario 3: The Refactoring Trap

"Refactor this 200-line function" becomes a complete module rearchitecture with strategy patterns, dependency injection, and event-driven side effects. The original function is gone. So are three working features that depended on its specific behavior.

With Karpathy guidelines: Claude extracts two helper functions, renames variables for clarity, adds a test verifying pre/post behavior parity. The function is now 80 lines. Everything still works.

Scenario 4: The Confusing Requirements

Your product manager says "make the form better." Claude assumes "better" means real-time validation, auto-save, progressive enhancement, and accessibility overhaul. Three days later, you learn they meant "add a character counter."

With Karpathy guidelines: Claude stops. "'Better' is ambiguous. Possible interpretations: (1) validation improvements, (2) UX enhancements, (3) accessibility compliance. Which aspects are priorities?" You clarify in 30 seconds. Claude implements exactly what's needed.

Installation & Setup: Get Protected in Under 60 Seconds

The repository offers two installation paths depending on your workflow.

Option A: Claude Code Plugin (Recommended)

This makes the guidelines available across all projects automatically.

First, add the marketplace:

/plugin marketplace add forrestchang/andrej-karpathy-skills

Then install the plugin:

/plugin install andrej-karpathy-skills@karpathy-skills

That's it. Every Claude Code session now inherits these behavioral constraints.

Option B: Per-Project CLAUDE.md

For project-specific application or when you can't use plugins.

New project:

# Download the guidelines directly into your project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

Existing project (append to existing CLAUDE.md):

# Add separator for clarity
echo "" >> CLAUDE.md

# Append Karpathy guidelines
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

Cursor Users: You're Covered Too

The repository includes a committed Cursor project rule at .cursor/rules/karpathy-guidelines.mdc. Open the project in Cursor and the same principles apply automatically. See CURSOR.md for using the rule in other projects.

Real Code Examples: See the Guidelines in Action

Let's examine how these principles transform actual interactions. The repository's structure demonstrates the simplicity-first ethos—minimal files, maximum impact.

Example 1: The Goal-Driven Execution Pattern

Here's how multi-step tasks should be structured with verification loops:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

This isn't pseudocode—it's the literal format the guidelines recommend for complex tasks. Let's see it applied:

Without guidelines:

# Claude receives: "Add user authentication"
# Result: 2,000-line auth system with speculative features

class AuthenticationManager:
    def __init__(self, strategy_factory, token_rotator, audit_logger):
        self.strategy_factory = strategy_factory  # Never used in MVP
        self.token_rotator = token_rotator        # Over-engineered
        self.audit_logger = audit_logger          # Not requested
        self.oauth_providers = {}                 # Speculative
    
    def authenticate(self, credentials, context=None, options=None):
        # 150 lines of abstraction over simple password check
        ...

With guidelines applied:

# Claude receives: "Add user authentication"
# Claude's internal plan:
# 1. Write test for valid login → verify: test passes
# 2. Write test for invalid password → verify: test passes  
# 3. Implement bcrypt password verification → verify: tests pass
# 4. Add session cookie → verify: integration test passes

import bcrypt
from flask import session

# Only what's needed. No speculation.
def verify_password(stored_hash: bytes, password: str) -> bool:
    """Constant-time password verification."""
    return bcrypt.checkpw(password.encode(), stored_hash)

def login_user(username: str, password: str, users_db: dict) -> bool:
    """Authenticate and establish session."""
    user = users_db.get(username)
    if not user or not verify_password(user['password_hash'], password):
        return False
    
    # Minimal viable session
    session['user_id'] = user['id']
    return True

The difference? 400 lines versus 40. Every line traces to the original request. No speculative abstractions. Verifiable at each step.

Example 2: Project-Specific Customization

The guidelines explicitly encourage merging with your existing standards. Here's the pattern from the repository:

## Project-Specific Guidelines

- Use TypeScript strict mode
- All API endpoints must have tests
- Follow the existing error handling patterns in `src/utils/errors.ts`

This shows the design philosophy: foundational constraints + contextual specifics. The Karpathy principles prevent AI failure modes; your additions encode domain requirements.

Example 3: The Surgical Change Discipline

When editing existing code, the guidelines enforce strict boundaries. Consider this scenario:

# Original file with mixed concerns
import json
import os  # ← unused, pre-existing

class DataProcessor:
    def process(self, data):
        # Your task: add CSV output option
        return json.dumps(data)
    
    def legacy_helper(self):  # ← dead code, pre-existing
        pass

Without guidelines: Claude "notices" the unused import, removes it, deletes legacy_helper, reformats with Black, and converts to dataclass. Your "add CSV output" PR has 40 changed lines.

With guidelines: Claude adds only the CSV method, uses existing style, leaves os import alone, mentions legacy_helper in comments but doesn't touch it.

import json
import os  # Preserved despite being unused—wasn't my change
import csv  # Only new import, directly for my feature

class DataProcessor:
    def process(self, data):
        return json.dumps(data)
    
    def process_csv(self, data, output_path):
        """Write data to CSV—added per request."""
        with open(output_path, 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=data[0].keys())
            writer.writeheader()
            writer.writerows(data)
    
    def legacy_helper(self):  # Untouched—not my change
        pass

The diff: 12 lines added, 0 removed. Exactly what was asked.

Advanced Usage & Pro Tips

Layering with Existing CLAUDE.md Files

Don't replace your project's existing context—append these guidelines. The repository specifically recommends this pattern for accumulated project wisdom:

# Preserve existing project context
cat existing-claude.md karpathy-guidelines.md > CLAUDE.md

The Judgment Exception

The guidelines include a critical escape hatch: "For trivial tasks (simple typo fixes, obvious one-liners), use judgment—not every change needs the full rigor." This prevents the guidelines themselves from becoming bureaucracy. The bias is caution over speed for non-trivial work—not slowness for everything.

Verification-Driven Development

The most powerful advanced pattern: always pair guidelines with test commands. Configure your environment so Claude can execute tests autonomously:

# In your CLAUDE.md or project config
## Testing
- Run `pytest` after implementation changes
- Run `pytest --watch` during development
- Coverage minimum: 80% for new code

This lets the "Goal-Driven Execution" principle fully activate—Claude writes tests, implements, verifies, loops.

Cursor-Specific: Cross-Project Rules

The .cursor/rules/karpathy-guidelines.mdc file can be symlinked or copied across projects. Maintain one canonical version, distribute everywhere:

# Global Cursor rules directory
mkdir -p ~/.cursor/rules
cp karpathy-guidelines.mdc ~/.cursor/rules/

How It Compares: Why Not Just Prompt Better?

Approach	Effectiveness	Maintenance	Portability	Behavioral Depth
Manual prompting	Low—forgets between sessions	High—repeat every time	None	Surface-level
Custom GPT/Claude instructions	Medium—project-agnostic	Medium—one-time setup	Limited to platform	Medium
Karpathy CLAUDE.md	High—embedded in context	Low—single file	Universal—any project	Deep—structural constraints
IDE linting rules	Medium—catches symptoms	High—complex configs	Varies by tool	Shallow—syntax only
Code review enforcement	High—human judgment	Very high—time intensive	Universal	Deep—but reactive

The key differentiator: proactive behavioral shaping versus reactive correction. Linting catches overengineering after it's written. These guidelines prevent it from being written. Manual prompting relies on your memory and consistency; this file is always present, always enforced.

Compared to platform-specific custom instructions, the CLAUDE.md approach is tool-agnostic—works with Claude Code, Cursor, or any future tool reading project context files.

Frequently Asked Questions

Will this slow down Claude Code for simple tasks?

No. The guidelines explicitly include a judgment clause for trivial changes. A typo fix doesn't need the full reasoning ritual. The bias is caution for non-trivial work where mistakes are costly.

Can I use this with GitHub Copilot or other AI tools?

The CLAUDE.md format is Claude Code-specific, but the principles translate. For Copilot, you'd need to adapt into comments or docstrings. The Cursor .mdc rule format works in Cursor IDE. The core value is the principles, not the specific file format.

What if my project already has a CLAUDE.md?

Append these guidelines. The repository specifically recommends this—project-specific rules layer on top of foundational behavioral constraints. Use echo "" >> CLAUDE.md and curl-append as shown in the installation section.

Do I need to understand Andrej Karpathy's original thread to use this?

Not at all. The repository distills his observations into actionable rules. Reading the original thread adds context but isn't required for effective use.

How do I know the guidelines are actually working?

Watch for four signals: fewer unnecessary changes in diffs, simpler first-attempt code, clarifying questions before implementation, and clean PRs without drive-by refactoring. The repository provides these exact verification criteria.

Can I modify the guidelines for my team's needs?

Absolutely—MIT licensed. The repository encourages customization, showing exactly how to add project-specific sections. The core four principles are foundational; adapt the specifics to your domain.

Does this work for non-coding tasks with Claude?

The principles generalize—"Think Before Acting," "Simplicity First," "Surgical Changes," and "Goal-Driven Execution" apply broadly. However, the specific formulations are optimized for software engineering contexts.

The Bottom Line

Andrej Karpathy didn't just identify why LLMs write bad code—he described a fundamental mismatch between how models are trained and how software should be built. The andrej-karpathy-skills repository bridges that gap with elegant minimalism: one file, four principles, immediate behavioral transformation.

You have two choices. Keep accepting AI-generated code that silently assumes, overcomplicates, and expands scope without permission. Or install these guidelines once, spend 30 seconds on setup, and never again discover that "simple fix" became a 1,000-line architecture expedition.

The repository is waiting at github.com/forrestchang/andrej-karpathy-skills. The plugin install takes two commands. The per-project setup takes one curl.

Your future self—reviewing clean, minimal diffs that do exactly what was asked—will thank you.

Install it now. Stop the bloat before your next prompt.

Stop Letting AI Write Bloated Code: Karpathy's Rules for Claude

What Is the Karpathy-Inspired Claude Code Guidelines?

The Four Principles That Fix Everything

1. Think Before Coding

2. Simplicity First

3. Surgical Changes

4. Goal-Driven Execution

Real-World Scenarios Where These Guidelines Shine

Scenario 1: The Legacy Codebase Touch

Scenario 2: The Startup MVP Sprint

Scenario 3: The Refactoring Trap

Scenario 4: The Confusing Requirements

Installation & Setup: Get Protected in Under 60 Seconds

Option A: Claude Code Plugin (Recommended)

Option B: Per-Project CLAUDE.md

Cursor Users: You're Covered Too

Real Code Examples: See the Guidelines in Action

Example 1: The Goal-Driven Execution Pattern

Example 2: Project-Specific Customization

Example 3: The Surgical Change Discipline

Advanced Usage & Pro Tips

Layering with Existing CLAUDE.md Files

The Judgment Exception

Verification-Driven Development

Cursor-Specific: Cross-Project Rules

How It Compares: Why Not Just Prompt Better?

Frequently Asked Questions

The Bottom Line

Tags

Comments (0)

Leave a Comment

Categories

Popular Articles

OpenClaw: Build Your Personal AI Assistant in Minutes

OpenClaw: The Self-Hosted AI Assistant That Changes Everything

HftBacktest: 5 Features That Transform HFT Backtesting

CodexSkills: The AI Agent Toolkit

YouTube Plus: The Essential iOS Enhancement Tool

Popular Tags

Related Articles

Why Alexandrie is the Ultimate Markdown Note-Taking App

Why CrossPaste is the Ultimate Game Changer for Clipboard Management

Why Chandra is the Ultimate OCR Tool for Handwriting and Tables