How Much Should Claude Code Do Today? A 4-Level Approval Worksheet
Tired of clicking 'Allow?' every step? Sort Claude Code's work into 4 levels to draw the line between what to delegate and what you decide.
It was a Friday evening, and I only meant to push a tiny fix to staging.
I asked, “Fix the typo in the docs, and while you’re at it, check that the build passes.” Claude Code fixed the typo, then went ahead and bumped two dependency packages on its own, and even rewrote a reference in .env.production. The local build passed, so the agent reported back with a calm “Done.”
What made my stomach drop was that I didn’t notice any of it until I read the diff all the way to the bottom. It wasn’t that the AI was bad. The real cause was that I had never once put into words how far it was allowed to go.
The flip side happened another day: “Allow? Yes/No” popped up about thirty times in a row, and I burned through all my willpower just to land a single commit. Give it too much rope and you have an accident; give it too little and nothing moves. This article is about the one-page worksheet I use to draw the right line in between.
Key takeaways
- Sort the work you hand to Claude Code into four levels: “read only,” “fix only,” “publish,” and “touch secrets.”
- For each level, decide up front who gives the final OK and what evidence proves it’s safe.
- Levels 0 and 1 go to the AI, level 2 needs a human check, and level 3 only the responsible person touches.
- Declaring “today we go this far” in one sentence each morning slashes the number of approvals.
- I’ve included copy-paste ledger code and a one-minute morning template.
Why decide an “approval budget” up front
What drains you on approvals isn’t Claude Code’s capability. It’s starting the run without deciding how far you’ll allow it today.
When you haven’t decided, people fall into one of two traps. You get tired and keep hitting “Allow” on everything, until one day a dangerous operation slips through. Or you get so cautious that you gate even typo fixes, and work grinds to a halt. Both share the same flaw: the decision is left to whatever mood you’re in at that moment.
That’s where the idea of an “approval budget” comes in. Like a money budget, you decide the boundary in advance: everything up to this lane is free today, and past it a human decides. With a boundary set, you no longer have to feel a flutter of anxiety at every reply the AI gives. What you watch is no longer “is the AI smart?” but “which lane did it stop in?”
Putting your decision criteria into words also keeps teams from arguing. Instead of “I stopped it because something felt off,” you can say “this is level 2, so it’s a human’s turn to check.” The general thinking behind permission design is touched on in the Claude Code getting started guide, but here I’m narrowing in on the grubby, practical job of “today’s line.”
Sort the work into four levels
First, sort the work you want Claude Code to do into four levels, in order of danger. Don’t overthink it. Sort only by “can it be undone?”, “will it go public?”, and “does it touch money or secrets?”
| Level | Example work | Who gives final OK | Evidence it’s safe |
|---|---|---|---|
| 0 | Read files, understand the structure | Leave to AI | List of what was read |
| 1 | Fix one reversible file | AI (human reviews diff) | Diff and build result |
| 2 | Publish to the live site | Human decides | Public URL and rollback steps |
| 3 | Touch secrets, billing, customer data | Owner only | Written approval |
The heart of this table is the two right-hand columns. Decide “who gives the OK” and “what evidence proves it’s safe” before the work starts. Decide it afterward, and you’ll get swept along by the momentum of the AI saying “Done” and skip the check.
The key word in level 1 is “reversible.” A typo fix or an added comment can be rolled back instantly even if it’s wrong. So you leave it to the AI, and the human just glances at the diff. A dependency package update, on the other hand, gets bumped up to level 2. It may look small, but its blast radius is impossible to read in advance. My opening accident was exactly the case of treating that as level 1.
What goes to the AI, and what you decide
Let me make the boundary a bit sharper. What you can leave to the AI is work where a mistake is noticed instantly and undone instantly. What a human should decide is work that affects the outside world the moment it runs.
- Leave to the AI: read, research, draft, fix one reversible file, run tests
- Human decides last: publish, change production data, register with external services, update dependencies, delete
When in doubt, bump it up one level. Remember just that and you won’t be far off. Only operations you’re fully confident are safe get lowered one notch at a time and automated later. The trick is not to aim for full autonomy from day one. This “promote gradually” mindset pairs well with writing your project rules; see how to write CLAUDE.md, and leave your chosen boundary in a file so it’s reproducible.
A copy-paste approval ledger
Words alone get forgotten. So let’s turn the four levels into something a machine can read, and filter out “how far today” on demand. If you have Node.js, it runs as-is.
// Approval ledger: each task carries its danger level, owner, and evidence
const approvalBudget = [
{ action: "read files", level: 0, owner: "AI", proof: "list of what was read" },
{ action: "fix one reversible file", level: 1, owner: "AI (human reviews)", proof: "diff and build result" },
{ action: "publish to live site", level: 2, owner: "human", proof: "public URL and rollback steps" },
{ action: "touch secrets or billing", level: 3, owner: "owner only", proof: "written approval" },
];
// Today's ceiling. 0 = read only, 1 = leave reversible fixes to the AI
const todayMax = Number(process.env.APPROVAL_MAX ?? 1);
const allowedToday = approvalBudget.filter((item) => item.level <= todayMax);
const needsHuman = approvalBudget.filter((item) => item.level > todayMax);
console.log(`Today's AI ceiling: level ${todayMax}`);
console.table(allowedToday);
console.log("Work a human decides:");
console.table(needsHuman);
Running it is this simple. You switch “today’s ceiling” with an environment variable.
# Today, leave "reversible fixes" to the AI
APPROVAL_MAX=1 node approval-budget.mjs
# Today, keep it to reading only
APPROVAL_MAX=0 node approval-budget.mjs
Keep the field names and rewrite the contents of action and proof to fit your own project. Hand this code to Claude Code and ask it to “fill in the values for our repo,” and you’ll have a working draft in seconds.
A one-minute morning prompt template
Once the ledger is ready, tell the AI “today’s lane” before you start work. Copy the text below and just fill in the blanks.
Let's set today's working lane up front.
- Today's goal: (e.g. fix typos and broken links in one blog post)
- May read: only src/content/blog/
- May edit: just one file within the above (reversible changes only)
- May run: npm run lint, run the tests
- Do not touch: .env, production deploy, dependency updates, deletions
Rules:
- For level 2 or higher (publish, production data, dependency updates, deletions), always check with me and stop first.
- After fixing, show me the diff and build result together at the end as "evidence."
- Do not end with just "Done." Write down which command you used to verify.
Just having this one block makes the AI stop “doing everything just in case” and operate within the lane. Once you’re used to it, move this content into your project rule file following how to write CLAUDE.md, and the daily paste disappears too.
Three places this pays off
1. Quality-checking high-volume blog or doc work With just “fix the article,” the AI rewrites the body, the image paths, and the links all at once. Split it in the ledger as “body typos are level 1, publishing is level 2,” and you can hand off the prose while keeping the publish button in your own hands. Have it produce the diff and build result as evidence, and your late-night review gets a lot easier.
2. Sorting inbound inquiries Reading and classifying incoming inquiries is level 0, fine to leave to the AI. But registering them in the customer ledger is level 3. Even if the AI decides “this looks like a deal,” writing to the production database stays on hold until the responsible person presses the button. Enforce this in the ledger and the accident of auto-registering a misclassified customer disappears.
3. A breath before deploy Put publishing at level 2 without exception. Don’t mark it “Done” just because the local build passed; stop until you’ve checked the public URL, the heading, and the rollback steps. My opening blunder, the “unrequested dependency update,” would have been stopped by a human check every time if level 2 had been explicit.
Common stumbles and how to fix them
The most common one is trying to finish everything in one request, creating a giant diff nobody can verify. The fix is simple: narrow each request to “one deliverable per request.” One article, one PR, one config spot. Cut it small and you can read the diff all the way through.
Next most common is treating local build success as done. The live site is showing a different page or the homepage, yet you see HTTP 200 and feel safe. Put “check the public URL and heading” in the evidence column and you’ll stop here.
The third is not recording what you tried. The next day you redo the same decision from scratch. Just leave the one-line note below and tomorrow’s you won’t be lost. If you want to raise the floor on how you ask Claude Code itself, read advanced prompt engineering alongside this, and your lane-setting will get a notch sharper.
FAQ
Q. Should I split the approval levels into finer steps? Four levels is plenty at first. The more you add, the more complex operations become, and in the end nobody follows them. Run it for a while, and only branch out the spots that feel too coarse later.
Q. How do I tell whether a level-1 task is “reversible”? Judge on two points: “can git roll it back in one command?” and “does it affect the outside world?” File edits roll back, but deploys, billing, sending email, and deletions do not. When in doubt, bump it to level 2.
Q. On a team, who decides the levels? The person starting the work declares it in the morning, and you decide the level-2-and-up deciders in advance. If the responsible person is out, decide not to do level-3 work that day, and you stay safe.
Q. Pasting the prompt every time is a pain. Once the lane settles, move it into your project rule file (CLAUDE.md). The AI reads it every time, so the paste is no longer needed.
Q. Can non-engineers use this worksheet? Yes. Even without running the code, you can draw the line with just the four-level table and the prompt template. For non-engineer use, Claude Code for non-engineers is also a good reference.
A handoff note
Leave the day’s decision in one line and tomorrow’s you, or your team, won’t repeat the same hesitation. Copy the form below and just fill it in.
- Date: 2026-06-07
- Today's goal: fix typos and broken links in one blog post
- Today's ceiling: level 1 (reversible fixes only)
- Evidence: diff, npm run build log, checked the public URL heading
- Where a human stopped it: dependency update (held, since it's level 2)
- Note for next time: do dependency updates together on a separate day at level 2
With this note, the post-publish check is easy too. HTTP 200 isn’t enough, so on the public URL you check that the heading, the canonical URL, the hero image, and the opening of the body all really belong to this article. If a different article or the homepage shows up, treat it as unpublished and redo the build and deploy. The official thinking on permission design is also documented in the Anthropic official docs.
What happened when I actually tried it
I applied this worksheet to my own blog operation for two weeks.
What helped most was getting into the habit of pasting “today we go to level 1” in the morning. That alone cut the “Allow?” prompts per commit by more than half, by feel. When the AI tried to step into level 2 or higher, it stopped exactly as the template’s rules said and asked me. The “noticed the dependencies had been updated” accident from the opening has been zero ever since.
What I also learned is that a level table you only make never gets used. Actually running the ledger code to put “today’s delegated work” on screen before starting raised the odds I’d stick to it. Rather than hunting for a smarter AI, draw a lane up front that you can recover from if you fall. It’s unglamorous, but this is what lets me delegate with the least stress right now.
If you want to extend this boundary across a whole team or into production operations, we can work out the concrete lane design together in a training or consultation session. For now, start tomorrow morning by pasting that one sentence: “today we go to level 1.”
Free PDF: Claude Code Cheatsheet
Enter your email and download the one-page Claude Code cheatsheet for commands, review habits, and safe workflows.
We handle your data with care and never send spam.
Level up your Claude Code workflow
Start with the free PDF, use Gumroad guides when you need repeatable workflows, and book consultation when rollout or revenue paths need human judgment.
About the Author
Masa
Engineer focused on practical Claude Code workflows. Runs claudecode-lab.com, a 10-language technical media site.
Related Posts
The 3-Minute Pre-Commit Check: Review What Claude Code Touched Before You Confirm
A 3-minute check to catch the changes Claude Code quietly widened before you commit: diff scope, proof, and staging only what you mean to.
The Risk Register You Build Before Rolling Claude Code Out to a Team
How to build a risk register that stops permission, CI, and deploy accidents when a team adopts Claude Code beyond a solo experiment.
A Verification Checklist So Claude Code Leaves Proof It Actually Finished
Stop trusting "done" reports. A practical checklist to verify Claude Code's work with build output, live URLs, and CTAs.
Related Products
The Complete Claude Code Setup & Configuration Guide
From install to team-ready workflow.
A practical guide to installation, CLAUDE.md, hooks, MCP servers, permissions, IDE setup, and CI/CD workflows.
50 Battle-Tested Claude Code Prompt Templates
Copy, paste, ship. 50 production-ready prompts.
Use proven prompts for code review, refactoring, testing, documentation, debugging, architecture, and incident response.