Claude Code API Cost Mastery: 5 Proven Techniques to Cut Bills from $450 to $45/Month
Real numbers behind Claude Code API pricing. Learn how prompt caching, model optimization, and batching achieved a 90% cost reduction—from $450 to $45 per month.
“I used Claude Code every day last month and got a $450 API bill”—this is a story more and more engineers are telling. Claude Code is powerful, but costs can vary by 10× or more depending on how you use it.
On this site (claudecode-lab.com) we auto-generate three multilingual articles every day with Claude Code. In the first week we burned through $380, but after optimization we’re doing the same workload for under $40/month. Here’s every step that achieved a 90% reduction.
First: Understand Where You’re Being Charged
To cut costs you need to know exactly what you’re paying for.
Claude API Cost = Input tokens × Input rate + Output tokens × Output rate
Pricing by Model (as of April 2026)
| Model | Input (standard) | Input (cache read) | Output |
|---|---|---|---|
| claude-opus-4-6 | $15/1M | $1.50/1M | $75/1M |
| claude-sonnet-4-6 | $3/1M | $0.30/1M | $15/1M |
| claude-haiku-4-5 | $0.80/1M | $0.08/1M | $4/1M |
Two critical insights:
- Output is 5× more expensive than input → trimming output alone gives a massive reduction
- Cache reads are 1/10 the price of standard input → caching is your biggest lever
Check Your Cost Breakdown in the Anthropic Console
# You can also check via the API
curl https://api.anthropic.com/v1/usage \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01"
The first step is knowing which model you’re hitting and how many tokens you’re consuming.
Technique 1: Cut Input Costs by 10× with Prompt Caching
The highest-impact optimization available. Add a single line to your system prompt and input costs drop to 1/10.
How It Works
Anthropic’s prompt cache charges $1.50/1M when the same content is resent within 5 minutes. With a 5-minute TTL, every call within that window costs almost nothing.
Implementation
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// ❌ No cache: charged $15/1M every call
const res = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 1024,
system: "You are an expert on the XXX project.\n" + longProjectContext,
messages: [{ role: "user", content: prompt }],
});
// ✅ With cache: subsequent calls cost $1.50/1M (90% off)
const res = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: "You are an expert on the XXX project.\n" + longProjectContext,
cache_control: { type: "ephemeral" }, // ← add just this
},
],
messages: [{ role: "user", content: prompt }],
});
Real Savings (This Site)
3 articles/day × 8,000-token system prompt × Opus $15/1M
Before optimization:
3 articles × 10 calls × 8,000 tokens × $15/1M = $3.60/day → $108/month
After optimization (with caching):
First write: 3 calls × 8,000 tokens × $18.75/1M = $0.45/day
27 cache reads: 27 × 8,000 tokens × $1.50/1M = $0.32/day
Total: $0.77/day → $23/month
Savings: $85/month (79% reduction)
Watch out for cache misses: the cache expires after 5 minutes. For batch processing, group multiple calls with the same system prompt within a 5-minute window for maximum effect.
Technique 2: Match the Model to the Task
Using Opus for everything is like delivering pizza in a Porsche.
Decision Framework
type TaskComplexity = "complex" | "standard" | "simple";
function getModel(task: TaskComplexity): string {
return {
complex: "claude-opus-4-6", // Architecture, hard debugging, code review
standard: "claude-sonnet-4-6", // General implementation, refactoring
simple: "claude-haiku-4-5-20251001", // Translation, formatting, classification, summaries
}[task];
}
Translation Example (This Site’s Multilingual Pipeline)
// Translating an article into 9 languages
// ❌ Translate with Opus: $75/1M × 2,000 output tokens × 9 languages = $1.35/article
const translations = await translateWithModel("claude-opus-4-6", article);
// ✅ Translate with Haiku: $4/1M × 2,000 output tokens × 9 languages = $0.072/article
const translations = await translateWithModel("claude-haiku-4-5-20251001", article);
// Savings: $1.35 → $0.072 (94.7% reduction, translation quality is practically equivalent)
Switching 3 articles/day × 9 languages to Haiku: $121/month → $6.50/month (94% reduction)
Technique 3: Deliberately Constrain Output Tokens
Output costs 5× more than input, yet many pipelines accept unnecessarily verbose responses.
Prompt Techniques to Constrain Output
❌ "Tell me what's wrong with this code"
→ Lengthy explanation returned (1,000 tokens)
✅ "List the problems in this code as bullet points, max 3 items, max 2 lines each"
→ Concise answer (200 tokens)
Effect: 80% output token reduction = cost $0.075 → $0.015 per call
Set max_tokens Appropriately
// ❌ The default 4096 is overkill for most tasks
const res = await client.messages.create({
model: "claude-opus-4-6",
max_tokens: 4096, // potentially billed up to 4,096 tokens
messages: [...]
});
// ✅ Tune per use-case
const configs = {
codeReview: { max_tokens: 512 }, // issues only
bugAnalysis: { max_tokens: 1024 }, // cause + fix
implementFeature:{ max_tokens: 4096 }, // full implementation
summarize: { max_tokens: 256 }, // summary only
};
Technique 4: Isolate Context with Sub-Agents
In long conversation sessions, the growing history is resent as input every turn, inflating costs. Delegating to a sub-agent resets the context.
// When the main conversation grows long, offload heavy work to a sub-agent
// ❌ Translate inside the main context: entire conversation history sent each time
const translation = await translateInCurrentContext(article);
// ✅ Delegate to a sub-agent: runs with a fresh context
const translation = await Agent({
subagent_type: "general-purpose",
prompt: `Translate the following article into English:\n\n${article}`,
// ← no prior conversation history, only the article is the input
});
Claude Code’s Agent tool works exactly this way. For “spot” tasks—translation, search, file operations—sub-agent delegation is the golden rule.
Technique 5: Monitor Costs and Set Budget Alerts
Finally: know your costs and put a ceiling on them. This is your safety net against runaway billing.
Setting Up in the Anthropic Console
- Go to Anthropic Console → Usage Limits
- Set a Monthly budget (e.g., $50/month)
- Set an Alert threshold (e.g., notify at $40)
Cost Tracking in Code
// Log the usage object from every response to track spend
interface CostTracker {
inputTokens: number;
outputTokens: number;
cacheReadTokens: number;
cacheWriteTokens: number;
}
function calculateCost(usage: CostTracker, model: string): number {
const rates = {
"claude-opus-4-6": {
input: 15, cacheRead: 1.5, cacheWrite: 18.75, output: 75
},
};
const rate = rates[model];
return (
(usage.inputTokens * rate.input +
usage.cacheReadTokens * rate.cacheRead +
usage.cacheWriteTokens * rate.cacheWrite +
usage.outputTokens * rate.output) / 1_000_000
);
}
const res = await client.messages.create({ ... });
const cost = calculateCost(res.usage, "claude-opus-4-6");
console.log(`This call cost: $${cost.toFixed(4)}`);
Summary: Stack Your Savings
| Technique | Reduction | Difficulty |
|---|---|---|
| Prompt caching | up to 90% | Low (add 1 line) |
| Model selection | up to 95% | Low–Medium |
| Output token limits | 30–80% | Low (prompt tuning) |
| Sub-agent delegation | 20–50% | Medium |
| Budget alerts | Prevents blowouts | Low |
Our results on this site:
Before optimization: $450/month (all tasks on Opus, no caching)
After optimization: $45/month (Haiku for translation, Opus with caching, output limits)
Savings: $405/month (90% reduction)
The single best first step you can take today: add cache_control: { type: "ephemeral" } to your system prompt. That alone cuts input costs to 1/10. Introduce the remaining techniques one by one after that.
Related Articles
- 7 Practical Techniques to Optimize Claude Code Token Usage
- 10 Dangerous Prompt Patterns in Claude Code
- Complete Guide to Harness Engineering
References
Level up your Claude Code workflow
50 battle-tested prompt templates you can copy-paste into Claude Code right now.
Free PDF: Claude Code Cheatsheet in 5 Minutes
Just enter your email and we'll send you the single-page A4 cheatsheet right away.
We handle your data with care and never send spam.
About the Author
Masa
Engineer obsessed with Claude Code. Runs claudecode-lab.com, a 10-language tech media with 2,000+ pages.
Related Posts
7 Real Production Incidents with Claude Code: Full Recovery Procedures with RCA & Prevention
7 real production incidents involving Claude Code: API key leaks, DB wipes, billing explosions, and service outages — with root cause analysis and prevention strategies.
10 Dangerous Prompt Patterns in Claude Code | What Not to Say and Safe Alternatives
Discover 10 dangerous prompt patterns you should never give Claude Code. Learn how vague instructions lead to code loss, DB destruction, billing explosions, and key leaks—with safe alternatives.
Claude Code Security Best Practices: API Keys, Permissions & Production Protection
A practical security guide for using Claude Code safely. From API key management to permission settings, Hooks-based automation, and production environment protection — with working code examples.