Tips & Tricks

Claude Code API Cost Mastery: 5 Proven Techniques to Cut Bills from $450 to $45/Month

Real numbers behind Claude Code API pricing. Learn how prompt caching, model optimization, and batching achieved a 90% cost reduction—from $450 to $45 per month.

“I used Claude Code every day last month and got a $450 API bill”—this is a story more and more engineers are telling. Claude Code is powerful, but costs can vary by 10× or more depending on how you use it.

On this site (claudecode-lab.com) we auto-generate three multilingual articles every day with Claude Code. In the first week we burned through $380, but after optimization we’re doing the same workload for under $40/month. Here’s every step that achieved a 90% reduction.

First: Understand Where You’re Being Charged

To cut costs you need to know exactly what you’re paying for.

Claude API Cost = Input tokens × Input rate + Output tokens × Output rate

Pricing by Model (as of April 2026)

ModelInput (standard)Input (cache read)Output
claude-opus-4-6$15/1M$1.50/1M$75/1M
claude-sonnet-4-6$3/1M$0.30/1M$15/1M
claude-haiku-4-5$0.80/1M$0.08/1M$4/1M

Two critical insights:

  1. Output is 5× more expensive than input → trimming output alone gives a massive reduction
  2. Cache reads are 1/10 the price of standard input → caching is your biggest lever

Check Your Cost Breakdown in the Anthropic Console

# You can also check via the API
curl https://api.anthropic.com/v1/usage \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"

The first step is knowing which model you’re hitting and how many tokens you’re consuming.

Technique 1: Cut Input Costs by 10× with Prompt Caching

The highest-impact optimization available. Add a single line to your system prompt and input costs drop to 1/10.

How It Works

Anthropic’s prompt cache charges $1.50/1M when the same content is resent within 5 minutes. With a 5-minute TTL, every call within that window costs almost nothing.

Implementation

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

// ❌ No cache: charged $15/1M every call
const res = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  system: "You are an expert on the XXX project.\n" + longProjectContext,
  messages: [{ role: "user", content: prompt }],
});

// ✅ With cache: subsequent calls cost $1.50/1M (90% off)
const res = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on the XXX project.\n" + longProjectContext,
      cache_control: { type: "ephemeral" },  // ← add just this
    },
  ],
  messages: [{ role: "user", content: prompt }],
});

Real Savings (This Site)

3 articles/day × 8,000-token system prompt × Opus $15/1M

Before optimization:
  3 articles × 10 calls × 8,000 tokens × $15/1M = $3.60/day → $108/month

After optimization (with caching):
  First write: 3 calls × 8,000 tokens × $18.75/1M = $0.45/day
  27 cache reads: 27 × 8,000 tokens × $1.50/1M = $0.32/day
  Total: $0.77/day → $23/month

Savings: $85/month (79% reduction)

Watch out for cache misses: the cache expires after 5 minutes. For batch processing, group multiple calls with the same system prompt within a 5-minute window for maximum effect.

Technique 2: Match the Model to the Task

Using Opus for everything is like delivering pizza in a Porsche.

Decision Framework

type TaskComplexity = "complex" | "standard" | "simple";

function getModel(task: TaskComplexity): string {
  return {
    complex: "claude-opus-4-6",        // Architecture, hard debugging, code review
    standard: "claude-sonnet-4-6",     // General implementation, refactoring
    simple: "claude-haiku-4-5-20251001", // Translation, formatting, classification, summaries
  }[task];
}

Translation Example (This Site’s Multilingual Pipeline)

// Translating an article into 9 languages

// ❌ Translate with Opus: $75/1M × 2,000 output tokens × 9 languages = $1.35/article
const translations = await translateWithModel("claude-opus-4-6", article);

// ✅ Translate with Haiku: $4/1M × 2,000 output tokens × 9 languages = $0.072/article
const translations = await translateWithModel("claude-haiku-4-5-20251001", article);

// Savings: $1.35 → $0.072 (94.7% reduction, translation quality is practically equivalent)

Switching 3 articles/day × 9 languages to Haiku: $121/month → $6.50/month (94% reduction)

Technique 3: Deliberately Constrain Output Tokens

Output costs 5× more than input, yet many pipelines accept unnecessarily verbose responses.

Prompt Techniques to Constrain Output

❌ "Tell me what's wrong with this code"
   → Lengthy explanation returned (1,000 tokens)

✅ "List the problems in this code as bullet points, max 3 items, max 2 lines each"
   → Concise answer (200 tokens)

Effect: 80% output token reduction = cost $0.075 → $0.015 per call

Set max_tokens Appropriately

// ❌ The default 4096 is overkill for most tasks
const res = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 4096,  // potentially billed up to 4,096 tokens
  messages: [...]
});

// ✅ Tune per use-case
const configs = {
  codeReview:      { max_tokens: 512  },  // issues only
  bugAnalysis:     { max_tokens: 1024 },  // cause + fix
  implementFeature:{ max_tokens: 4096 },  // full implementation
  summarize:       { max_tokens: 256  },  // summary only
};

Technique 4: Isolate Context with Sub-Agents

In long conversation sessions, the growing history is resent as input every turn, inflating costs. Delegating to a sub-agent resets the context.

// When the main conversation grows long, offload heavy work to a sub-agent

// ❌ Translate inside the main context: entire conversation history sent each time
const translation = await translateInCurrentContext(article);

// ✅ Delegate to a sub-agent: runs with a fresh context
const translation = await Agent({
  subagent_type: "general-purpose",
  prompt: `Translate the following article into English:\n\n${article}`,
  // ← no prior conversation history, only the article is the input
});

Claude Code’s Agent tool works exactly this way. For “spot” tasks—translation, search, file operations—sub-agent delegation is the golden rule.

Technique 5: Monitor Costs and Set Budget Alerts

Finally: know your costs and put a ceiling on them. This is your safety net against runaway billing.

Setting Up in the Anthropic Console

  1. Go to Anthropic ConsoleUsage Limits
  2. Set a Monthly budget (e.g., $50/month)
  3. Set an Alert threshold (e.g., notify at $40)

Cost Tracking in Code

// Log the usage object from every response to track spend
interface CostTracker {
  inputTokens: number;
  outputTokens: number;
  cacheReadTokens: number;
  cacheWriteTokens: number;
}

function calculateCost(usage: CostTracker, model: string): number {
  const rates = {
    "claude-opus-4-6": {
      input: 15, cacheRead: 1.5, cacheWrite: 18.75, output: 75
    },
  };
  const rate = rates[model];
  return (
    (usage.inputTokens * rate.input +
     usage.cacheReadTokens * rate.cacheRead +
     usage.cacheWriteTokens * rate.cacheWrite +
     usage.outputTokens * rate.output) / 1_000_000
  );
}

const res = await client.messages.create({ ... });
const cost = calculateCost(res.usage, "claude-opus-4-6");
console.log(`This call cost: $${cost.toFixed(4)}`);

Summary: Stack Your Savings

TechniqueReductionDifficulty
Prompt cachingup to 90%Low (add 1 line)
Model selectionup to 95%Low–Medium
Output token limits30–80%Low (prompt tuning)
Sub-agent delegation20–50%Medium
Budget alertsPrevents blowoutsLow

Our results on this site:

Before optimization: $450/month (all tasks on Opus, no caching)
After optimization:  $45/month  (Haiku for translation, Opus with caching, output limits)
Savings: $405/month (90% reduction)

The single best first step you can take today: add cache_control: { type: "ephemeral" } to your system prompt. That alone cuts input costs to 1/10. Introduce the remaining techniques one by one after that.

References

#claude-code #cost #api #prompt-caching #optimization #anthropic

Level up your Claude Code workflow

50 battle-tested prompt templates you can copy-paste into Claude Code right now.

Free

Free PDF: Claude Code Cheatsheet in 5 Minutes

Just enter your email and we'll send you the single-page A4 cheatsheet right away.

We handle your data with care and never send spam.

Masa

About the Author

Masa

Engineer obsessed with Claude Code. Runs claudecode-lab.com, a 10-language tech media with 2,000+ pages.