How Much Should You Trust Claude Code? The 4-Level Autonomy Ladder

One Friday night I asked Claude Code to “just fix the broken links in the old posts on this blog” and went to bed. The next morning I opened the commit history and found that 12 articles had been rewritten. The links were fixed, sure. But while it was in there, it had also “tidied up” the wording of my headings, overwriting some deliberately old-fashioned phrasing I had kept on purpose.

It took me two hours to undo. Why would supposedly-smart Claude Code do all this extra work? The answer is simple: I had never once told it how far it was allowed to go.

The opposite happens too. Spooked, I flipped to a setup where it asked “can I run this?” before every single one-line edit. Within 30 minutes I was exhausted from clicking the approve button, and I ended up writing the changes faster by hand. Trust it too much and you crash; stop it too much and you burn out. Most people get lost wandering between those two extremes.

This article is about ending that wandering with a “4-level autonomy ladder.” Read-only, reversible edits, publishing, and humans-only — draw those four lines once up front, and almost every per-task judgment call disappears.

Key takeaways

Sorting Claude Code’s work by danger level into 4 tiers makes decisions easier: read-only / reversible edits / publish and deploy / humans only.
Each level must have one piece of “proof it finished” decided in advance. A diff appears, the build passes, the public URL is correct, and so on.
Deletes, production data, billing, and force pushes never go automatic, no matter how comfortable you get. Only here does a human press the button every time.
Write the levels into a single YAML file next to CLAUDE.md, and Claude Code itself can self-report “which level am I on right now.”
Start one level lower than feels necessary, and only promote operations you have confirmed are safe. Never go the other way.

Why the line matters more than the smarts

Most people who stumble with Claude Code fail at how they bound the work, not at the model’s capability. That was me at the top. The instruction itself wasn’t bad. I just communicated the “links only” scope in prose alone. A request written in prose is like a verbal instruction to a busy colleague on a hectic morning — it’s trivially easy for the interpretation to drift.

That’s where a danger-based line helps. You decide what’s allowed not by the wording of a request but by a “level.” Then every time Claude Code is about to do something, you can check it against “which level is this operation?” The vague battle over interpreting English turns into a mechanical check.

This idea isn’t new. In a factory, the visitor who only tours, the worker who assembles parts, the person who presses the ship button, and the one who opens the safe all have separate permissions from day one. We’re just drawing the same lines for Claude Code.

What the 4 levels contain

Here are the 4 levels I actually use. Let me show the whole picture in a table first.

Level	What it does	Proof it finished
0 read-only	Map file structure, surface risks, suggest check commands	The final note lists the files it read
1 reversible edit	Fix one article, update one test	A diff appears and the build passes
2 publish / deploy	Deploy after build, verify the public URL	h1, canonical URL, CTA, and rollback steps are all present
3 humans only	(Don’t let Claude Code do it)	—

Level 3 covers secrets, billing, production data, and force pushes. No matter how comfortable you get, this never gets automated. The loss when you get it wrong isn’t worth the effort you’d save.

The key thing at every level is to decide the “proof it finished” in advance. Without proof, Claude Code can say “done” and you have no way to know if it really finished. A diff appears, the build passes, you open the public URL and the h1 is correct — pick machine-checkable things like these as your proof.

What to delegate to the AI vs. decide yourself

Once the lines are clear, the division of labor falls out naturally.

What you can safely hand Claude Code is work with lots of steps whose result a machine can verify. Reading many files and summarizing them, fixing articles in a fixed format, running tests and reading the logs. It’s faster and more accurate than a human at these.

What you should decide yourself is irreversible calls and high-loss operations. Whether this article is really OK to publish, whether this old phrasing is OK to delete, whether this customer data is OK to overwrite. Here you let Claude Code “propose,” but a human presses “execute.”

When you’re unsure, there’s one test: “If this fails, can I undo it in 10 minutes?” If yes, it’s level 1. If no, it’s level 2 or 3. My opening disaster took two hours to undo — which means it was really work that needed level-2 caution.

A copy-paste level definition

If you keep the levels only in your head, they’ll drift. Write them out into a single YAML file and drop it at the project root as autonomy.yml. Reference it from CLAUDE.md, and Claude Code will start self-reporting things like “I’m on level 1, so I won’t publish.”

# autonomy.yml — the autonomy lines you hand Claude Code
safe_autonomy_ladder:
  level_0_read_only:
    allowed: ["map file structure", "surface risks", "suggest check commands"]
    proof: "the final note lists the files it read"
  level_1_reversible_edit:
    allowed: ["fix one article", "update one test"]
    proof: "git diff appears and the build passes"
  level_2_publish_or_deploy:
    allowed: ["deploy after build", "verify the public URL"]
    proof: "h1, canonical URL, CTA, and rollback steps are present"
  level_3_owner_only:
    allowed: []
    examples: ["secrets", "billing", "force push", "customer data"]

In CLAUDE.md, one sentence is enough.

Before working, read autonomy.yml and declare upfront which level this task is.
For any level-2-or-higher operation, do not execute it; wait for human approval.

A verification script that checks the level by machine

Just having it declare isn’t reassuring enough, so I keep a machine watching “whether a level-2 operation (publishing) happened without proof.” The script below goes and fetches the public URL after a deploy and checks that the h1 isn’t empty and that the canonical URL points at my own slug. It runs as-is on Node.js 18 or later.

// verify-publish.mjs — machine-check the "proof it finished" for level 2
// Usage: node verify-publish.mjs https://claudecode-lab.com/blog/your-slug

const url = process.argv[2];
if (!url) {
  console.error("Pass the public URL as an argument");
  process.exit(1);
}

const res = await fetch(url);
if (res.status !== 200) {
  console.error(`HTTP ${res.status}: not published yet`);
  process.exit(1);
}

const html = await res.text();

// Does an h1 exist with actual content?
const h1 = html.match(/<h1[^>]*>(.*?)<\/h1>/s);
const hasH1 = Boolean(h1 && h1[1].replace(/<[^>]+>/g, "").trim());

// Does the canonical URL point at this page itself?
const canonical = html.match(/<link[^>]+rel=["']canonical["'][^>]*>/i);
const canonicalOk = Boolean(canonical && canonical[0].includes(new URL(url).pathname));

console.log(`has h1:          ${hasH1 ? "OK" : "NG"}`);
console.log(`canonical match: ${canonicalOk ? "OK" : "NG"}`);

if (hasH1 && canonicalOk) {
  console.log("Level-2 proof is complete. You can treat publishing as done.");
} else {
  console.error("Proof is missing. Don't mark this as done yet.");
  process.exit(1);
}

If you assume HTTP 200 alone means success, you won’t notice when a homepage fallback or an old article is being served. Only once you check the h1 and canonical URL too can you say “this level is finished.”

Three places where this pays off

1. Right after entering a new repository. Letting it edit before it knows the structure or the commands tends to break config files. Allow only level 0 at first, and have it report the file layout, the usable commands, and the places that are dangerous to touch. Once there’s a map, you promote to level 1.

2. Fixing a typo in an article. Level 1 is plenty here. Once a diff appears and the build passes, the proof is complete. My opening disaster could have been prevented if I’d spelled out the scope here: “fix typos only, don’t change the wording.” Here’s a template request to hand Claude Code.

Work at level 1.
Target: only obvious typos in content/blog/foo.mdx
Do NOT: rephrase headings, change wording, or edit other files
When done, show me git diff and self-confirm that the changes are typo fixes only.

3. Just before a production publish. At level 2, always have it confirm: does the build pass, are the environment variables present, does the diff match expectations, and is there a rollback procedure. Run the verification script above last, and you can check the contents of the public URL too.

Three mistakes I made

Honestly, I crashed many times before settling on these 4 levels.

The first was skipping levels all at once. When I skipped level 0 and had it do a big edit straight at level 1, the diff came out too large to verify and I couldn’t tell which parts were even correct. Now I always promote from 0 in order.

The second was calling it “done” on a local build alone. It passed on my machine, so I published. But production was showing an old page, and I didn’t notice for half a day. Since then I don’t call it done until I’ve seen the public URL’s h1 and canonical URL.

The third was relying on my own eyes alone for approval. “I’ll just check it at the end” always breaks down on a tired night. Once I built machine-checkable checks — character count, broken links, type errors — into each level’s “proof,” my late-night oversights dropped dramatically.

How to start

Don’t aim for a fully-automatic genius agent right away. Pick one small job you can undo even if it fails. Typo checks on a draft, a first-pass review of a change, a pre-publish check before staging. That’s about the right size.

The order is always the same: (1) narrowly decide the read scope, (2) make the deliverable explicit, (3) have a command do the checking wherever possible, (4) set every dangerous operation (delete, production data, billing, force push) to “ask a human” at first. Only promote operations you’ve confirmed are safe, one level at a time. You never demote in the other direction.

If you stumble on the fine-grained permission settings, read Claude Code getting started guide first, and for how to write level rules into CLAUDE.md, see CLAUDE.md best practices — then these 4 levels drop straight into your config. If non-engineers will be on the team, Claude Code for non-engineers pairs well too.

FAQ

Q. Do I have to set up the levels by hand every time? No. Make one autonomy.yml and reference it from CLAUDE.md, and from then on Claude Code declares “I’m on level 1” itself. You set it up only once.

Q. Is it OK to automate all the way to level-2 “publishing”? If the proof comes together by machine, it’s fine to automate after you’re comfortable. But always insert a mechanism that checks the contents of the public URL, like the verification script above. Trusting HTTP 200 alone is the most dangerous thing.

Q. How do I tell which operations belong in level 3? Think “can I get away with an apology, or does it require compensation?” Overwriting customer data, billing, and production deletes are the latter, so they’re level 3. They never go automatic, even when you’re comfortable.

Q. Do small teams need the lines too? If anything, the smaller the team, the more it helps. It’s easy for “who approved this” to get fuzzy, so sharing the level table makes “who signs off on this operation” obvious at a glance.

Q. Can’t I skip the levels if I just craft the prompt well? Prompts drift in interpretation. Building the skill to write good requests is worth it separately via advanced prompt engineering, but the levels are a “safety net that doesn’t depend on how slick your wording is.” Having both is strongest.

What happened when I actually tried it

Since I put these 4 levels into my own blog operation, the “it did things I never asked for” disasters from the opening dropped to zero. I confirmed two things.

One: once I referenced autonomy.yml from CLAUDE.md, Claude Code began declaring on its own, before working, “this time it’s level 1, so I won’t publish.” Hand over the line as a level instead of prose, and I felt firsthand that the interpretation stops drifting.

Two: once I started always running verify-publish.mjs after publishing, I could detect the “old page showing in production” phenomenon — which once went unnoticed for half a day — right after publishing. Just looking at the h1 and canonical URL fills in the HTTP 200 trap.

Rather than hunting for a smarter model, decide up front on levels where you won’t get hurt when you fall. It looks like a detour, but my honest feeling now is that it’s the fastest path. If you’re at the stage of organizing Claude Code permission rules for a team, we can design the lines together in training and consultation. For the official permission settings, also check the Anthropic documentation.

How Much Should You Trust Claude Code? The 4-Level Autonomy Ladder

Key takeaways

Why the line matters more than the smarts

What the 4 levels contain

What to delegate to the AI vs. decide yourself

A copy-paste level definition

A verification script that checks the level by machine

Three places where this pays off

Three mistakes I made

How to start

FAQ

What happened when I actually tried it

Free PDF: Claude Code Cheatsheet

Level up your Claude Code workflow

Related Posts

You Memorized Every Claude Code Command and Still Freeze? Here's the Safe First Move

Claude Code on an Inherited Repo: Make the First Edit Safe

How to Write a Brief for Claude Code's First Task

Related Products

The Complete Claude Code Setup & Configuration Guide

Claude Code Quick Reference Cheatsheet