Claude Code/API लागत नियंत्रण: टोकन बजट, अलर्ट और सीमा

Claude Code को रोज़मर्रा के विकास में इस्तेमाल करना आसान हो जाता है जब आप खर्च को पहले से समझा सकें। लागत सिर्फ मॉडल से तय नहीं होती। यह इस बात पर निर्भर करती है कि Claude ने कितनी फाइलें पढ़ीं, context कितना बड़ा है, जवाब कितना लंबा है, prompt cache कितनी बार hit हुआ, और आप Claude subscription इस्तेमाल कर रहे हैं या Anthropic API की pay-as-you-go billing.

यह गाइड 2026-06-03 को आधिकारिक स्रोतों से मिलाकर अपडेट की गई है: Anthropic API pricing, Claude Code cost management, Prompt caching, Token counting और Usage and Cost API. कीमतें बदल सकती हैं, इसलिए खरीद या client estimate से पहले official page फिर से देखें।

लागत को सरल भाषा में समझें

शब्द	आसान मतलब	लागत पर असर
Token	Claude जो पढ़ता और लिखता है उसकी गिनती	लंबी फाइल, log, prompt और generated code खर्च बढ़ाते हैं
Context	conversation history, पढ़ी गई files, CLAUDE.md और tool definitions	पुराना context अगले requests में भी cost जोड़ता है
Prompt cache	वही prompt prefix दोबारा इस्तेमाल करना	hit होने पर repeated input बहुत सस्ता हो जाता है
Budget guard	task, day, user या workspace की सीमा	अनचाहे खर्च को रोकता है

estimated cost = input tokens * input rate
               + cache write tokens * cache write rate
               + cache read tokens * cache read rate
               + output tokens * output rate

2026-06-03 तक official pricing में Sonnet 4.6 के लिए input $3/MTok और output $15/MTok है। Haiku 4.5 के लिए $1 और $5 है। Opus 4.8/4.7/4.6 के लिए $5 और $25 है। Cache read base input price का 10% है, और 5-minute cache write base input price का 1.25x है।

नए users अक्सर output cost भूल जाते हैं। “सब कुछ विस्तार से समझाओ” की जगह “severity के हिसाब से maximum 5 findings, हर finding 2 lines” लिखना बेहतर है।

Cost control loop

flowchart LR
  A["Task define करें"] --> B["Input छोटा करें"]
  B --> C["Model चुनें"]
  C --> D["Tokens estimate करें"]
  D --> E{"Budget में है?"}
  E -- "हाँ" --> F["Claude run करें"]
  E -- "नहीं" --> B
  F --> G["usage log करें"]
  G --> H{"Threshold cross?"}
  H -- "हाँ" --> I["Stop, alert या model बदलें"]
  H -- "नहीं" --> A

Claude Code में पहले /usage और /context देखें। अलग काम पर जाते समय /clear करें। निर्णय बचाने हैं लेकिन पूरा history नहीं चाहिए तो /compact करें। /usage में दिखने वाला amount local estimate है; अंतिम billing Console में देखें। Team/organization के लिए Usage and Cost API से daily report लें।

Example 1: monthly estimator

यह script API call नहीं करती। Daily MTok usage देकर monthly cost का अंदाजा देती है।

// claude-cost-estimator.mjs
const RATES = {
  opus48: { input: 5, output: 25, cacheRead: 0.5 },
  sonnet46: { input: 3, output: 15, cacheRead: 0.3 },
  haiku45: { input: 1, output: 5, cacheRead: 0.1 },
};

const [model = "sonnet46", days = "22", input = "0.25", output = "0.06", cacheRead = "0.20"] =
  process.argv.slice(2);

if (!RATES[model]) {
  throw new Error(`Unknown model: ${model}`);
}

const rate = RATES[model];
const dailyUsd =
  Number(input) * rate.input +
  Number(output) * rate.output +
  Number(cacheRead) * rate.cacheRead;

console.log({
  model,
  workDays: Number(days),
  dailyUsd: Number(dailyUsd.toFixed(4)),
  monthlyUsd: Number((dailyUsd * Number(days)).toFixed(2)),
});

node claude-cost-estimator.mjs sonnet46 22 0.25 0.06 0.20
node claude-cost-estimator.mjs haiku45 22 0.25 0.06 0.20

यह exact bill नहीं है। इसका काम यह बताना है कि workflow $15/month जैसा है, $50/month जैसा है या $500/month जैसा। Tools, retries, cache writes और लंबी responses के लिए 20-30% buffer रखें।

Example 2: daily budget वाली API call

यह example request भेजने से पहले token count करता है, response के बाद actual usage JSONL में लिखता है और daily budget cross होने से पहले रुक जाता है।

npm init -y
npm i @anthropic-ai/sdk

// budgeted-message.mjs
import Anthropic from "@anthropic-ai/sdk";
import fs from "node:fs";

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const model = process.env.CLAUDE_MODEL ?? "claude-sonnet-4-6";
const maxTokens = Number(process.env.MAX_TOKENS ?? 700);
const dailyBudgetUsd = Number(process.env.DAILY_BUDGET_USD ?? 5);

const RATES = {
  "claude-opus-4-8": { input: 5, output: 25, cacheWrite5m: 6.25, cacheRead: 0.5 },
  "claude-sonnet-4-6": { input: 3, output: 15, cacheWrite5m: 3.75, cacheRead: 0.3 },
  "claude-haiku-4-5": { input: 1, output: 5, cacheWrite5m: 1.25, cacheRead: 0.1 },
};

function usdFromUsage(usage, rate) {
  return (
    (usage.input_tokens ?? 0) * rate.input +
    (usage.output_tokens ?? 0) * rate.output +
    (usage.cache_creation_input_tokens ?? 0) * rate.cacheWrite5m +
    (usage.cache_read_input_tokens ?? 0) * rate.cacheRead
  ) / 1_000_000;
}

function todayTotalUsd(path) {
  if (!fs.existsSync(path)) return 0;
  const today = new Date().toISOString().slice(0, 10);
  return fs.readFileSync(path, "utf8")
    .trim()
    .split("\n")
    .filter(Boolean)
    .map((line) => JSON.parse(line))
    .filter((row) => row.date === today)
    .reduce((sum, row) => sum + row.usd, 0);
}

const messages = [
  { role: "user", content: "List only the top three bug risks in this TypeScript function." },
];

const rate = RATES[model];
if (!rate) throw new Error(`No rate table for ${model}`);

const counted = await anthropic.messages.countTokens({ model, messages });
const worstCaseUsd = (counted.input_tokens * rate.input + maxTokens * rate.output) / 1_000_000;
const logPath = "claude-usage.jsonl";

if (todayTotalUsd(logPath) + worstCaseUsd > dailyBudgetUsd) {
  throw new Error(`Budget stop: projected daily spend exceeds $${dailyBudgetUsd}`);
}

const response = await anthropic.messages.create({
  model,
  max_tokens: maxTokens,
  cache_control: { type: "ephemeral" },
  system: "You are a concise senior code reviewer. Return only actionable findings.",
  messages,
});

const usd = usdFromUsage(response.usage, rate);
fs.appendFileSync(logPath, JSON.stringify({
  date: new Date().toISOString().slice(0, 10),
  model,
  usd: Number(usd.toFixed(6)),
  usage: response.usage,
}) + "\n");

console.log({ id: response.id, usd: Number(usd.toFixed(6)), usage: response.usage });

ANTHROPIC_API_KEY=sk-ant-...
DAILY_BUDGET_USD=5 node budgeted-message.mjs

टीम को समझाने के लिए इसे ऐसे बोलें: request भेजने से पहले वजन नापो, response के बाद receipt रखो, और daily limit से पहले रोक दो।

Example 3: team usage report

Organization accounts Admin Usage and Cost API से daily usage देख सकते हैं। इसके लिए Admin API key चाहिए; normal API key या individual account काफी नहीं है।

curl "https://api.anthropic.com/v1/organizations/usage_report/messages?\
starting_at=2026-06-01T00:00:00Z&\
ending_at=2026-06-08T00:00:00Z&\
group_by[]=model&\
bucket_width=1d" \
  --header "anthropic-version: 2023-06-01" \
  --header "x-api-key: $ANTHROPIC_ADMIN_KEY"

Signal	Risk	Action
Opus share	आसान काम premium model पर जा रहे हैं	summary, translation, formatting को Sonnet/Haiku पर भेजें
Output tokens	जवाब बहुत लंबे हैं	findings, lines और `max_tokens` सीमित करें
Cache reads	`cache_read_input_tokens` लगभग zero है	cached prefix से timestamp और random value हटाएं

तीन practical use cases

Solo developer: Sonnet को default रखें। Opus को architecture decision, hard debugging या critical review के लिए बचाएं। Task बदलते ही /clear करें।

Content और localization: style guide और glossary को stable cached prefix बनाएं, article body ही बदलें। बड़ी asynchronous jobs के लिए Batch API का 50% discount देखें।

Training और team rollout: training day पर concurrency अचानक बढ़ती है। Daily budget, alert threshold और prompt rules पहले तय करें। Structured team enablement के लिए /training/ देखें।

Common pitfalls

API key billing बदल सकती है। Official help कहती है कि ANTHROPIC_API_KEY Claude Code में logged-in subscription से priority ले सकती है। /status से check करें।

Cache को measure नहीं किया जाता। Prompt cache stable prefix पर निर्भर है। system prompt में timestamp, UUID या dynamic file list रखने से hit rate गिर सकता है।

Output खुला छोड़ दिया जाता है। Review में findings limit, summary में length limit और code generation में file scope होना चाहिए।

पुराने prices copy कर दिए जाते हैं। Model names और prices बदलते हैं। Calculator, proposal या training material publish करने से पहले official pricing page देखें।

Unofficial cheap proxy इस्तेमाल होता है। अगर model identity, log retention और credential handling साफ नहीं है, तो discount असली बचत नहीं है।

Reusable budget sheets, prompt templates और checklists /products/ में मिलेंगे।

Hands-on result

ClaudeCodeLab के content workflows में सबसे बड़ा फायदा basic controls से आया: common instructions cache करना, translation और formatting को Haiku/Sonnet पर भेजना, Opus को high-judgment tasks तक सीमित रखना और हर API usage को JSONL में लिखना। शुरुआत 80% alert, 100% stop और task बदलते समय context साफ करने से करें।

Claude Code/API लागत नियंत्रण: टोकन बजट, अलर्ट और सीमा

लागत को सरल भाषा में समझें

Cost control loop

Example 1: monthly estimator

Example 2: daily budget वाली API call

Example 3: team usage report

तीन practical use cases

Common pitfalls

Hands-on result

मुफ़्त PDF: Claude Code cheatsheet

संबंधित लेख

Claude Code permission safety ladder: access धीरे-धीरे बढ़ाएं

Claude Code Small PR Proof Pack: छोटे PR को review-ready बनाना

Claude Code Review Gate Before Commit: diff, test, public URL और CTA जांच

लागत को सरल भाषा में समझें

Cost control loop

Example 1: monthly estimator

Example 2: daily budget वाली API call

Example 3: team usage report

तीन practical use cases

Common pitfalls

Related reading

Hands-on result

मुफ़्त PDF: Claude Code cheatsheet

संबंधित लेख

Claude Code permission safety ladder: access धीरे-धीरे बढ़ाएं

Claude Code Small PR Proof Pack: छोटे PR को review-ready बनाना

Claude Code Review Gate Before Commit: diff, test, public URL और CTA जांच