Use Cases

Claude Code × Amazon Bedrock Complete Guide | Running Claude in Production on AWS

Complete guide to using Amazon Bedrock with Claude Code. From IAM authentication, streaming, Lambda integration, RAG implementation, to cost optimization — based on Masa's real production experience.

If you’ve hit a wall with “I want to use Claude API in production but I’m worried about API key management” or “Our internal security policy won’t allow data to leave AWS” — Amazon Bedrock is the answer.

When I was integrating AI into an API server on ECS for work, I initially used the Anthropic API directly. But a security review flagged “management of API keys to external services” as a concern. After switching to Bedrock, authentication was handled entirely through IAM roles, and I was freed from API key management. This article covers everything from implementing Bedrock with Claude Code to production operations.


What is Amazon Bedrock?

Amazon Bedrock is a managed AI model service from AWS. You can call multiple models — Claude (Anthropic), Llama (Meta), Titan (Amazon) — through a unified API.

Why Use Bedrock?

AspectAnthropic APIAmazon Bedrock
AuthenticationAPI keyAWS IAM role
BillingDirectly to AnthropicIntegrated into AWS billing
VPC supportNoneFully private with PrivateLink
Data retentionAnthropic’s policyAWS’s policy
ComplianceSOC2, etc.SOC2 / ISO27001 / HIPAA, etc.

Anthropic API is convenient for personal projects, but for enterprise, finance, and healthcare use cases, Bedrock is increasingly the only option.


Step 1: Initial Setup

Requesting Model Access

First, request access to Claude models in the AWS console.

# Check the list of available models
aws bedrock list-foundation-models \
  --by-provider anthropic \
  --region us-east-1 \
  --query 'modelSummaries[].{id:modelId, name:modelName}'

# Sample output
[
  {"id": "anthropic.claude-opus-4-5",     "name": "Claude Opus 4.5"},
  {"id": "anthropic.claude-sonnet-4-6",   "name": "Claude Sonnet 4.6"},
  {"id": "anthropic.claude-haiku-4-5-20251001", "name": "Claude Haiku 4.5"}
]

Important: The primary available regions are us-east-1 (Virginia) and us-west-2 (Oregon). Tokyo region can be used via Cross-region inference.

SDK Installation

npm install @anthropic-ai/sdk @aws-sdk/client-bedrock-runtime

Step 2: Basic Implementation

The official Anthropic SDK has built-in Bedrock support. Since the syntax is nearly identical to the regular Anthropic API, the migration cost from existing code is minimal.

// src/lib/bedrock-client.ts
import Anthropic from "@anthropic-ai/sdk";

// No credentials needed when IAM role is used (e.g., Lambda/ECS)
const bedrock = new Anthropic.AnthropicBedrock({
  awsRegion: process.env.AWS_REGION ?? "us-east-1",
  // AWS CLI profile is used automatically during local development
});

export async function generateText(
  prompt: string,
  options: { model?: string; maxTokens?: number } = {}
): Promise<string> {
  const { model = "anthropic.claude-sonnet-4-6", maxTokens = 1024 } = options;

  const response = await bedrock.messages.create({
    model,
    max_tokens: maxTokens,
    messages: [{ role: "user", content: prompt }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Bedrock model IDs differ from the Anthropic API:

Anthropic API: claude-sonnet-4-6
Bedrock:       anthropic.claude-sonnet-4-6  (prefix added)

Streaming Support

Streaming is essential for long responses.

// src/lib/bedrock-stream.ts
export async function* streamText(
  prompt: string,
  model = "anthropic.claude-sonnet-4-6"
): AsyncGenerator<string> {
  const stream = await bedrock.messages.stream({
    model,
    max_tokens: 4096,
    messages: [{ role: "user", content: prompt }],
  });

  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      yield chunk.delta.text;
    }
  }
}

// Usage example (Next.js App Router)
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      for await (const text of streamText(prompt)) {
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

Step 3: Lambda + Bedrock Pattern

The most common architecture for providing serverless AI features.

claude -p "
Implement the following Lambda function in src/lambda/ai-handler.ts:
- Accept prompt and maxTokens from the event
- Call Bedrock (claude-sonnet-4-6) and return the result
- Handle errors: ThrottlingException (retry) and ValidationException (400)
- Log execution time
- Initialize client outside the handler (cold start optimization)
"
// src/lambda/ai-handler.ts
import { Handler, APIGatewayProxyEvent, APIGatewayProxyResult } from "aws-lambda";
import Anthropic from "@anthropic-ai/sdk";

// Initialize at module scope (cached on container reuse)
const bedrock = new Anthropic.AnthropicBedrock({
  awsRegion: process.env.AWS_REGION,
});

export const handler: Handler<APIGatewayProxyEvent, APIGatewayProxyResult> = async (event) => {
  const startTime = Date.now();

  try {
    const { prompt, maxTokens = 512 } = JSON.parse(event.body ?? "{}");

    if (!prompt) {
      return { statusCode: 400, body: JSON.stringify({ error: "prompt is required" }) };
    }

    const response = await bedrock.messages.create({
      model: "anthropic.claude-sonnet-4-6",
      max_tokens: maxTokens,
      messages: [{ role: "user", content: prompt }],
    });

    const duration = Date.now() - startTime;
    console.log(JSON.stringify({
      level: "INFO",
      duration_ms: duration,
      input_tokens: response.usage.input_tokens,
      output_tokens: response.usage.output_tokens,
    }));

    return {
      statusCode: 200,
      body: JSON.stringify({
        text: response.content[0].type === "text" ? response.content[0].text : "",
        usage: response.usage,
      }),
    };
  } catch (error: any) {
    if (error.name === "ThrottlingException") {
      console.warn("Rate limited by Bedrock, client should retry");
      return { statusCode: 429, body: JSON.stringify({ error: "Rate limited, please retry" }) };
    }
    console.error("Bedrock error:", error);
    return { statusCode: 500, body: JSON.stringify({ error: "AI generation failed" }) };
  }
};

Lambda IAM Policy

// IAM configuration with CDK
import * as iam from "aws-cdk-lib/aws-iam";

lambdaFunction.addToRolePolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  actions: [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream",
  ],
  resources: [
    `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6`,
    `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001`,
  ],
}));

Step 4: RAG (Retrieval-Augmented Generation) Implementation

A pattern where Claude reads internal documents or product information to answer questions.

claude -p "
Implement a RAG system using Bedrock Knowledge Base.

Architecture:
- Store documents in S3
- Index with Bedrock Knowledge Base vector indexing
- Retrieve documents based on user questions
- Generate answers with Claude Sonnet

Implement with TypeScript + AWS SDK v3.
Get Knowledge Base ID from the KNOWLEDGE_BASE_ID environment variable.
"
// src/lib/rag.ts
import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";

const agentClient = new BedrockAgentRuntimeClient({ region: "us-east-1" });

export async function ragQuery(question: string): Promise<{
  answer: string;
  citations: string[];
}> {
  const response = await agentClient.send(
    new RetrieveAndGenerateCommand({
      input: { text: question },
      retrieveAndGenerateConfiguration: {
        type: "KNOWLEDGE_BASE",
        knowledgeBaseConfiguration: {
          knowledgeBaseId: process.env.KNOWLEDGE_BASE_ID!,
          modelArn: `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6`,
          retrievalConfiguration: {
            vectorSearchConfiguration: { numberOfResults: 5 },
          },
        },
      },
    })
  );

  const answer = response.output?.text ?? "";
  const citations = (response.citations ?? [])
    .flatMap((c) => c.retrievedReferences ?? [])
    .map((r) => r.location?.s3Location?.uri ?? "")
    .filter(Boolean);

  return { answer, citations };
}

Step 5: Cost Optimization

// Model selection utility
type TaskType = "classify" | "extract" | "summarize" | "generate" | "complex";

const MODEL_MAP: Record<TaskType, string> = {
  classify: "anthropic.claude-haiku-4-5-20251001",  // $0.80/1M input
  extract:  "anthropic.claude-haiku-4-5-20251001",
  summarize: "anthropic.claude-sonnet-4-6",          // $3.00/1M input
  generate: "anthropic.claude-sonnet-4-6",
  complex:  "anthropic.claude-opus-4-5",             // $15.00/1M input
};

export function selectModel(task: TaskType): string {
  return MODEL_MAP[task];
}

Reduce Input Costs with Prompt Caching

// Prompt caching is also available in Bedrock
const response = await bedrock.messages.create({
  model: "anthropic.claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: longSystemPrompt,
      cache_control: { type: "ephemeral" },  // Cache for 5 minutes
    },
  ],
  messages: [{ role: "user", content: userQuery }],
});

5 Common Pitfalls

1. Region not supported

Claude on Bedrock is not available in all regions. As of 2026, us-east-1 and us-west-2 are the primary regions. To use it from Tokyo, enable Cross-region inference.

// Use the cross-region inference model ARN
const crossRegionModelArn = 
  "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6";

2. Forgetting to request model access

In Bedrock, you must request “Model access” for each model you want to use. Calling a model without requesting access will result in an AccessDeniedException. Always request access before coding with Claude Code.

3. Lambda timeout too short

Claude responses can take 10–30 seconds. The Lambda default of 3 seconds will definitely time out. Set it to at least 30 seconds, and 60–300 seconds for longer generations.

4. Confusing Bedrock model IDs with Anthropic API IDs

❌ Using the Anthropic API ID directly: "claude-sonnet-4-6"
✅ Bedrock ID: "anthropic.claude-sonnet-4-6"

5. Not accounting for Cross-region inference latency

Calling models in us-east-1 from Tokyo adds round-trip network latency (approximately 100–200ms). For latency-sensitive applications, use streaming to reduce perceived delay.


Summary

TaskClaude Code’s Contribution
Basic implementationGenerates AnthropicBedrock client and functions
Lambda integrationGenerates handler and IAM policy together
RAG implementationAuto-generates Knowledge Base integration code
Cost optimizationDesigns model selection logic by task type
TroubleshootingIdentifies cause and suggests fix from error logs

Develop with Claude Code, run in production on Bedrock — this combination satisfies security, cost, and scalability requirements all at once. Start with the free Bedrock trial, and when you’re ready to go to production, all you need is to configure the IAM role.

References

#claude-code #aws #bedrock #anthropic #typescript #generative-ai

Level up your Claude Code workflow

50 battle-tested prompt templates you can copy-paste into Claude Code right now.

Free

Free PDF: Claude Code Cheatsheet in 5 Minutes

Just enter your email and we'll send you the single-page A4 cheatsheet right away.

We handle your data with care and never send spam.

Masa

About the Author

Masa

Engineer obsessed with Claude Code. Runs claudecode-lab.com, a 10-language tech media with 2,000+ pages.