Claude Code × Amazon Bedrock Complete Guide | Running Claude in Production on AWS

If you’ve hit a wall with “I want to use Claude API in production but I’m worried about API key management” or “Our internal security policy won’t allow data to leave AWS” — Amazon Bedrock is the answer.

When I was integrating AI into an API server on ECS for work, I initially used the Anthropic API directly. But a security review flagged “management of API keys to external services” as a concern. After switching to Bedrock, authentication was handled entirely through IAM roles, and I was freed from API key management. This article covers everything from implementing Bedrock with Claude Code to production operations.

What is Amazon Bedrock?

Amazon Bedrock is a managed AI model service from AWS. You can call multiple models — Claude (Anthropic), Llama (Meta), Titan (Amazon) — through a unified API.

Why Use Bedrock?

Aspect	Anthropic API	Amazon Bedrock
Authentication	API key	AWS IAM role
Billing	Directly to Anthropic	Integrated into AWS billing
VPC support	None	Fully private with PrivateLink
Data retention	Anthropic’s policy	AWS’s policy
Compliance	SOC2, etc.	SOC2 / ISO27001 / HIPAA, etc.

Anthropic API is convenient for personal projects, but for enterprise, finance, and healthcare use cases, Bedrock is increasingly the only option.

Step 1: Initial Setup

Requesting Model Access

First, request access to Claude models in the AWS console.

# Check the list of available models
aws bedrock list-foundation-models \
  --by-provider anthropic \
  --region us-east-1 \
  --query 'modelSummaries[].{id:modelId, name:modelName}'

# Sample output
[
  {"id": "anthropic.claude-opus-4-5",     "name": "Claude Opus 4.5"},
  {"id": "anthropic.claude-sonnet-4-6",   "name": "Claude Sonnet 4.6"},
  {"id": "anthropic.claude-haiku-4-5-20251001", "name": "Claude Haiku 4.5"}
]

Important: The primary available regions are us-east-1 (Virginia) and us-west-2 (Oregon). Tokyo region can be used via Cross-region inference.

SDK Installation

npm install @anthropic-ai/sdk @aws-sdk/client-bedrock-runtime

Step 2: Basic Implementation

Using Anthropic SDK’s Built-in Bedrock Support (Recommended)

The official Anthropic SDK has built-in Bedrock support. Since the syntax is nearly identical to the regular Anthropic API, the migration cost from existing code is minimal.

// src/lib/bedrock-client.ts
import Anthropic from "@anthropic-ai/sdk";

// No credentials needed when IAM role is used (e.g., Lambda/ECS)
const bedrock = new Anthropic.AnthropicBedrock({
  awsRegion: process.env.AWS_REGION ?? "us-east-1",
  // AWS CLI profile is used automatically during local development
});

export async function generateText(
  prompt: string,
  options: { model?: string; maxTokens?: number } = {}
): Promise<string> {
  const { model = "anthropic.claude-sonnet-4-6", maxTokens = 1024 } = options;

  const response = await bedrock.messages.create({
    model,
    max_tokens: maxTokens,
    messages: [{ role: "user", content: prompt }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Bedrock model IDs differ from the Anthropic API:

Anthropic API: claude-sonnet-4-6
Bedrock:       anthropic.claude-sonnet-4-6  (prefix added)

Streaming Support

Streaming is essential for long responses.

// src/lib/bedrock-stream.ts
export async function* streamText(
  prompt: string,
  model = "anthropic.claude-sonnet-4-6"
): AsyncGenerator<string> {
  const stream = await bedrock.messages.stream({
    model,
    max_tokens: 4096,
    messages: [{ role: "user", content: prompt }],
  });

  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      yield chunk.delta.text;
    }
  }
}

// Usage example (Next.js App Router)
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      for await (const text of streamText(prompt)) {
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

Step 3: Lambda + Bedrock Pattern

The most common architecture for providing serverless AI features.

claude -p "
Implement the following Lambda function in src/lambda/ai-handler.ts:
- Accept prompt and maxTokens from the event
- Call Bedrock (claude-sonnet-4-6) and return the result
- Handle errors: ThrottlingException (retry) and ValidationException (400)
- Log execution time
- Initialize client outside the handler (cold start optimization)
"

// src/lambda/ai-handler.ts
import { Handler, APIGatewayProxyEvent, APIGatewayProxyResult } from "aws-lambda";
import Anthropic from "@anthropic-ai/sdk";

// Initialize at module scope (cached on container reuse)
const bedrock = new Anthropic.AnthropicBedrock({
  awsRegion: process.env.AWS_REGION,
});

export const handler: Handler<APIGatewayProxyEvent, APIGatewayProxyResult> = async (event) => {
  const startTime = Date.now();

  try {
    const { prompt, maxTokens = 512 } = JSON.parse(event.body ?? "{}");

    if (!prompt) {
      return { statusCode: 400, body: JSON.stringify({ error: "prompt is required" }) };
    }

    const response = await bedrock.messages.create({
      model: "anthropic.claude-sonnet-4-6",
      max_tokens: maxTokens,
      messages: [{ role: "user", content: prompt }],
    });

    const duration = Date.now() - startTime;
    console.log(JSON.stringify({
      level: "INFO",
      duration_ms: duration,
      input_tokens: response.usage.input_tokens,
      output_tokens: response.usage.output_tokens,
    }));

    return {
      statusCode: 200,
      body: JSON.stringify({
        text: response.content[0].type === "text" ? response.content[0].text : "",
        usage: response.usage,
      }),
    };
  } catch (error: any) {
    if (error.name === "ThrottlingException") {
      console.warn("Rate limited by Bedrock, client should retry");
      return { statusCode: 429, body: JSON.stringify({ error: "Rate limited, please retry" }) };
    }
    console.error("Bedrock error:", error);
    return { statusCode: 500, body: JSON.stringify({ error: "AI generation failed" }) };
  }
};

Lambda IAM Policy

// IAM configuration with CDK
import * as iam from "aws-cdk-lib/aws-iam";

lambdaFunction.addToRolePolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  actions: [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream",
  ],
  resources: [
    `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6`,
    `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001`,
  ],
}));

Step 4: RAG (Retrieval-Augmented Generation) Implementation

A pattern where Claude reads internal documents or product information to answer questions.

claude -p "
Implement a RAG system using Bedrock Knowledge Base.

Architecture:
- Store documents in S3
- Index with Bedrock Knowledge Base vector indexing
- Retrieve documents based on user questions
- Generate answers with Claude Sonnet

Implement with TypeScript + AWS SDK v3.
Get Knowledge Base ID from the KNOWLEDGE_BASE_ID environment variable.
"

// src/lib/rag.ts
import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";

const agentClient = new BedrockAgentRuntimeClient({ region: "us-east-1" });

export async function ragQuery(question: string): Promise<{
  answer: string;
  citations: string[];
}> {
  const response = await agentClient.send(
    new RetrieveAndGenerateCommand({
      input: { text: question },
      retrieveAndGenerateConfiguration: {
        type: "KNOWLEDGE_BASE",
        knowledgeBaseConfiguration: {
          knowledgeBaseId: process.env.KNOWLEDGE_BASE_ID!,
          modelArn: `arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6`,
          retrievalConfiguration: {
            vectorSearchConfiguration: { numberOfResults: 5 },
          },
        },
      },
    })
  );

  const answer = response.output?.text ?? "";
  const citations = (response.citations ?? [])
    .flatMap((c) => c.retrievedReferences ?? [])
    .map((r) => r.location?.s3Location?.uri ?? "")
    .filter(Boolean);

  return { answer, citations };
}

Step 5: Cost Optimization

// Model selection utility
type TaskType = "classify" | "extract" | "summarize" | "generate" | "complex";

const MODEL_MAP: Record<TaskType, string> = {
  classify: "anthropic.claude-haiku-4-5-20251001",  // $0.80/1M input
  extract:  "anthropic.claude-haiku-4-5-20251001",
  summarize: "anthropic.claude-sonnet-4-6",          // $3.00/1M input
  generate: "anthropic.claude-sonnet-4-6",
  complex:  "anthropic.claude-opus-4-5",             // $15.00/1M input
};

export function selectModel(task: TaskType): string {
  return MODEL_MAP[task];
}

Reduce Input Costs with Prompt Caching

// Prompt caching is also available in Bedrock
const response = await bedrock.messages.create({
  model: "anthropic.claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: longSystemPrompt,
      cache_control: { type: "ephemeral" },  // Cache for 5 minutes
    },
  ],
  messages: [{ role: "user", content: userQuery }],
});

5 Common Pitfalls

1. Region not supported

Claude on Bedrock is not available in all regions. As of 2026, us-east-1 and us-west-2 are the primary regions. To use it from Tokyo, enable Cross-region inference.

// Use the cross-region inference model ARN
const crossRegionModelArn = 
  "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6";

2. Forgetting to request model access

In Bedrock, you must request “Model access” for each model you want to use. Calling a model without requesting access will result in an AccessDeniedException. Always request access before coding with Claude Code.

3. Lambda timeout too short

Claude responses can take 10–30 seconds. The Lambda default of 3 seconds will definitely time out. Set it to at least 30 seconds, and 60–300 seconds for longer generations.

4. Confusing Bedrock model IDs with Anthropic API IDs

❌ Using the Anthropic API ID directly: "claude-sonnet-4-6"
✅ Bedrock ID: "anthropic.claude-sonnet-4-6"

5. Not accounting for Cross-region inference latency

Calling models in us-east-1 from Tokyo adds round-trip network latency (approximately 100–200ms). For latency-sensitive applications, use streaming to reduce perceived delay.

Summary

Task	Claude Code’s Contribution
Basic implementation	Generates AnthropicBedrock client and functions
Lambda integration	Generates handler and IAM policy together
RAG implementation	Auto-generates Knowledge Base integration code
Cost optimization	Designs model selection logic by task type
Troubleshooting	Identifies cause and suggests fix from error logs

Develop with Claude Code, run in production on Bedrock — this combination satisfies security, cost, and scalability requirements all at once. Start with the free Bedrock trial, and when you’re ready to go to production, all you need is to configure the IAM role.

Claude Code × Amazon Bedrock Complete Guide | Running Claude in Production on AWS

What is Amazon Bedrock?

Why Use Bedrock?

Step 1: Initial Setup

Requesting Model Access

SDK Installation

Step 2: Basic Implementation

Using Anthropic SDK’s Built-in Bedrock Support (Recommended)

Streaming Support

Step 3: Lambda + Bedrock Pattern

Lambda IAM Policy

Step 4: RAG (Retrieval-Augmented Generation) Implementation

Step 5: Cost Optimization

Reduce Input Costs with Prompt Caching

5 Common Pitfalls

Summary

References

Level up your Claude Code workflow

Free PDF: Claude Code Cheatsheet in 5 Minutes

Related Posts

Claude Code × AWS CodePipeline/CodeBuild Complete Guide | Automate CI/CD Pipeline Build

Claude Code × AWS CloudWatch Complete Guide | Log Analysis, Alarm Setup & Dashboard Automation

Claude Code × AWS ECS/Fargate Complete Guide | Automate Container Deployments

Related Products

50 Battle-Tested Claude Code Prompt Templates

What is Amazon Bedrock?

Why Use Bedrock?

Step 1: Initial Setup

Requesting Model Access

SDK Installation

Step 2: Basic Implementation

Using Anthropic SDK’s Built-in Bedrock Support (Recommended)

Streaming Support

Step 3: Lambda + Bedrock Pattern

Lambda IAM Policy

Step 4: RAG (Retrieval-Augmented Generation) Implementation

Step 5: Cost Optimization

Reduce Input Costs with Prompt Caching

5 Common Pitfalls

Summary

Related Articles

References

Level up your Claude Code workflow

Free PDF: Claude Code Cheatsheet in 5 Minutes

Related Posts

Claude Code × AWS CodePipeline/CodeBuild Complete Guide | Automate CI/CD Pipeline Build

Claude Code × AWS CloudWatch Complete Guide | Log Analysis, Alarm Setup & Dashboard Automation

Claude Code × AWS ECS/Fargate Complete Guide | Automate Container Deployments

Related Products

50 Battle-Tested Claude Code Prompt Templates