Claude Code × AWS CloudWatch Panduan Lengkap | Analisis Log, Pengaturan Alarm & Otomatisasi Dashboard

“Ada error di produksi! Tapi log-nya terlalu banyak, tidak tahu harus mulai dari mana.” — Kepanikan klasik saat menangani insiden.

CloudWatch adalah layanan monitoring standar AWS, namun dengan volume log yang sangat besar, informasi kritis bisa tertimbun dan konfigurasi alarm sering ditunda. Saya memantau sistem ECS + Lambda di tempat kerja, dan membiarkan Claude Code membaca log untuk mengidentifikasi penyebab telah memangkas rata-rata waktu respons insiden kami sebesar 40%.

Artikel ini menjelaskan langkah-langkah praktis untuk mengotomatiskan analisis log CloudWatch, desain alarm, dan pembuatan dashboard dengan Claude Code.

Komponen Utama CloudWatch

CloudWatch Logs      : Menyimpan dan mencari log dari aplikasi dan layanan AWS
CloudWatch Metrics   : Data numerik seperti penggunaan CPU dan jumlah permintaan
CloudWatch Alarms    : Mendeteksi pelampauan ambang batas dan memberi notifikasi ke SNS dll.
CloudWatch Dashboards: Tampilan kustom untuk memvisualisasikan metrik dan log
Log Insights         : Mesin query mirip SQL untuk menganalisis log

Langkah 1: Delegasikan Analisis Pola Log ke Claude Code

Saat insiden, prioritas pertama adalah memahami pola log error.

# Ambil log error dari satu jam terakhir
aws logs filter-log-events \
  --log-group-name "/ecs/myapp" \
  --start-time $(date -d "1 hour ago" +%s000) \
  --filter-pattern "ERROR" \
  --output json > error-logs.json

claude -p "
Analisis log error CloudWatch berikut dan:

1. Klasifikasikan error berdasarkan jenisnya (5xx, 4xx, error koneksi DB, timeout, dll.)
2. Identifikasi error yang paling sering terjadi
3. Tentukan waktu ketika error melonjak drastis
4. Ajukan hipotesis tentang akar penyebab masalah
5. Sarankan langkah investigasi berikutnya

$(cat error-logs.json | head -500)
"

Buat Query Log Insights Secara Otomatis

claude -p "
Buat query CloudWatch Log Insights untuk tujuan berikut:

1. Tingkat error per endpoint dalam satu jam terakhir (top 10)
2. Detail permintaan dengan latensi di atas 500 ms
3. Semua log operasi pengguna tertentu (user_id: 12345)
4. Error pertama yang muncul dalam 30 menit setelah deployment

Format log: JSON (timestamp, level, message, user_id, endpoint, duration_ms, status_code)
"

Contoh query Log Insights yang dihasilkan:

-- Tingkat error per endpoint
fields @timestamp, endpoint, status_code
| filter status_code >= 400
| stats count() as error_count by endpoint
| sort error_count desc
| limit 10

-- Permintaan dengan latensi di atas 500 ms
fields @timestamp, endpoint, duration_ms, user_id
| filter duration_ms > 500
| sort duration_ms desc
| limit 50

-- Log operasi pengguna tertentu
fields @timestamp, level, message, endpoint
| filter user_id = "12345"
| sort @timestamp desc
| limit 100

Langkah 2: Buat Konfigurasi Alarm Secara Otomatis

claude -p "
Rancang semua alarm CloudWatch yang diperlukan untuk sistem berikut.
Implementasikan dalam CDK TypeScript.

[Arsitektur Sistem]
- ECS Fargate (server API, 2–10 instance)
- RDS PostgreSQL
- ALB (Application Load Balancer)
- Lambda (pemrosesan batch)

[Persyaratan Alarm]
- Produksi: alarm yang terpicu dalam 5 menit
- Tujuan notifikasi: SNS → Slack dan PagerDuty
- Alarm dua tingkat (Warning / Critical)
- Di luar jam kerja: hanya Critical
"

// lib/monitoring-stack.ts
import * as cdk from "aws-cdk-lib";
import * as cloudwatch from "aws-cdk-lib/aws-cloudwatch";
import * as actions from "aws-cdk-lib/aws-cloudwatch-actions";
import * as sns from "aws-cdk-lib/aws-sns";

export class MonitoringStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const alertTopic = sns.Topic.fromTopicArn(
      this, "AlertTopic",
      `arn:aws:sns:${this.region}:${this.account}:prod-alerts`
    );
    const warnTopic = sns.Topic.fromTopicArn(
      this, "WarnTopic",
      `arn:aws:sns:${this.region}:${this.account}:prod-warnings`
    );

    // Alarm tingkat error 5xx ALB
    const alb5xxAlarm = new cloudwatch.Alarm(this, "Alb5xxAlarm", {
      alarmName: "prod-alb-5xx-critical",
      alarmDescription: "Tingkat error 5xx ALB melebihi 5%",
      metric: new cloudwatch.Metric({
        namespace: "AWS/ApplicationELB",
        metricName: "HTTPCode_Target_5XX_Count",
        dimensionsMap: { LoadBalancer: "app/myapp/xxx" },
        statistic: "Sum",
        period: cdk.Duration.minutes(5),
      }),
      threshold: 10,
      evaluationPeriods: 2,
      comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
      treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
    });
    alb5xxAlarm.addAlarmAction(new actions.SnsAction(alertTopic));

    // Alarm penggunaan CPU ECS (Warning/Critical)
    const ecsCpuWarning = new cloudwatch.Alarm(this, "EcsCpuWarning", {
      alarmName: "prod-ecs-cpu-warning",
      metric: new cloudwatch.Metric({
        namespace: "AWS/ECS",
        metricName: "CPUUtilization",
        dimensionsMap: { ClusterName: "myapp-cluster", ServiceName: "myapp-service" },
        statistic: "Average",
        period: cdk.Duration.minutes(5),
      }),
      threshold: 70,
      evaluationPeriods: 3,
    });
    ecsCpuWarning.addAlarmAction(new actions.SnsAction(warnTopic));

    const ecsCpuCritical = new cloudwatch.Alarm(this, "EcsCpuCritical", {
      alarmName: "prod-ecs-cpu-critical",
      metric: new cloudwatch.Metric({
        namespace: "AWS/ECS",
        metricName: "CPUUtilization",
        dimensionsMap: { ClusterName: "myapp-cluster", ServiceName: "myapp-service" },
        statistic: "Average",
        period: cdk.Duration.minutes(5),
      }),
      threshold: 90,
      evaluationPeriods: 2,
    });
    ecsCpuCritical.addAlarmAction(new actions.SnsAction(alertTopic));

    // Alarm jumlah koneksi RDS
    const rdsConnectionAlarm = new cloudwatch.Alarm(this, "RdsConnectionAlarm", {
      alarmName: "prod-rds-connections-critical",
      metric: new cloudwatch.Metric({
        namespace: "AWS/RDS",
        metricName: "DatabaseConnections",
        dimensionsMap: { DBInstanceIdentifier: "myapp-db" },
        statistic: "Maximum",
        period: cdk.Duration.minutes(5),
      }),
      threshold: 80,  // 80% dari koneksi maksimum db.t3.micro
      evaluationPeriods: 2,
    });
    rdsConnectionAlarm.addAlarmAction(new actions.SnsAction(alertTopic));

    // Alarm error Lambda
    const lambdaErrorAlarm = new cloudwatch.Alarm(this, "LambdaErrorAlarm", {
      alarmName: "prod-lambda-errors-critical",
      metric: new cloudwatch.Metric({
        namespace: "AWS/Lambda",
        metricName: "Errors",
        dimensionsMap: { FunctionName: "myapp-batch" },
        statistic: "Sum",
        period: cdk.Duration.minutes(15),
      }),
      threshold: 5,
      evaluationPeriods: 1,
    });
    lambdaErrorAlarm.addAlarmAction(new actions.SnsAction(alertTopic));
  }
}

Langkah 3: Buat Dashboard Kustom Secara Otomatis

claude -p "
Buat dashboard CloudWatch di CDK yang menampilkan informasi berikut.

[Tata Letak Dashboard]
Baris 1: Kesehatan sistem keseluruhan (jumlah permintaan ALB, tingkat 5xx, latensi P50/P95/P99)
Baris 2: Layanan ECS (CPU, memori, jumlah task yang berjalan)
Baris 3: RDS (koneksi, latensi, penggunaan CPU)
Baris 4: Lambda (invokasi, error, durasi)
Baris 5: Metrik bisnis (pendaftaran baru, tingkat keberhasilan pembayaran) ← metrik kustom
"

// Definisi dashboard (kutipan)
const dashboard = new cloudwatch.Dashboard(this, "AppDashboard", {
  dashboardName: "myapp-production",
});

dashboard.addWidgets(
  new cloudwatch.Row(
    new cloudwatch.GraphWidget({
      title: "Jumlah Permintaan ALB",
      left: [new cloudwatch.Metric({
        namespace: "AWS/ApplicationELB",
        metricName: "RequestCount",
        statistic: "Sum",
        period: cdk.Duration.minutes(1),
      })],
      width: 8,
    }),
    new cloudwatch.GraphWidget({
      title: "Tingkat Error 5xx ALB (%)",
      left: [new cloudwatch.MathExpression({
        expression: "5xx / (2xx + 3xx + 4xx + 5xx) * 100",
        usingMetrics: {
          "5xx": new cloudwatch.Metric({ metricName: "HTTPCode_Target_5XX_Count", namespace: "AWS/ApplicationELB", statistic: "Sum" }),
          "2xx": new cloudwatch.Metric({ metricName: "HTTPCode_Target_2XX_Count", namespace: "AWS/ApplicationELB", statistic: "Sum" }),
          "3xx": new cloudwatch.Metric({ metricName: "HTTPCode_Target_3XX_Count", namespace: "AWS/ApplicationELB", statistic: "Sum" }),
          "4xx": new cloudwatch.Metric({ metricName: "HTTPCode_Target_4XX_Count", namespace: "AWS/ApplicationELB", statistic: "Sum" }),
        },
        period: cdk.Duration.minutes(1),
      })],
      width: 8,
    }),
  )
);

Langkah 4: Delegasikan Investigasi Insiden ke Claude Code

claude -p "
Saya ingin menginvestigasi insiden produksi. Jalankan perintah berikut dan analisis hasilnya:

1. aws logs filter-log-events --log-group-name '/ecs/myapp' \
   --start-time \$(date -d '2 hours ago' +%s000) \
   --filter-pattern 'ERROR' --limit 100

2. aws cloudwatch get-metric-statistics \
   --namespace AWS/ApplicationELB \
   --metric-name HTTPCode_Target_5XX_Count \
   --start-time \$(date -d '2 hours ago' -u +%Y-%m-%dT%H:%M:%SZ) \
   --end-time \$(date -u +%Y-%m-%dT%H:%M:%SZ) \
   --period 300 --statistics Sum

Berdasarkan hasil di atas, rangkum:
- Waktu mulai insiden
- Perkiraan jumlah pengguna yang terdampak
- Top 3 hipotesis akar penyebab
- Tindakan respons segera
"

Langkah 5: Rancang Metrik Kustom Secara Otomatis

claude -p "
Buat kode Node.js (AWS SDK v3) untuk mengukur KPI bisnis e-commerce berikut
sebagai metrik kustom CloudWatch.

Metrik yang diukur:
- Jumlah pembayaran berhasil dan gagal (setiap menit)
- Tingkat abandonment keranjang belanja (setiap 5 menit)
- Pendaftaran anggota baru (setiap jam)

Namespace: MyApp/Business
Tandai setiap metrik dengan label lingkungan (Production/Staging)
"

// src/monitoring/business-metrics.ts
import { CloudWatchClient, PutMetricDataCommand } from "@aws-sdk/client-cloudwatch";

const cw = new CloudWatchClient({ region: process.env.AWS_REGION });
const NAMESPACE = "MyApp/Business";
const ENV = process.env.NODE_ENV ?? "development";

export async function recordPaymentSuccess() {
  await cw.send(new PutMetricDataCommand({
    Namespace: NAMESPACE,
    MetricData: [{
      MetricName: "PaymentSuccess",
      Value: 1,
      Unit: "Count",
      Dimensions: [{ Name: "Environment", Value: ENV }],
    }],
  }));
}

export async function recordPaymentFailure(reason: string) {
  await cw.send(new PutMetricDataCommand({
    Namespace: NAMESPACE,
    MetricData: [{
      MetricName: "PaymentFailure",
      Value: 1,
      Unit: "Count",
      Dimensions: [
        { Name: "Environment", Value: ENV },
        { Name: "Reason", Value: reason },
      ],
    }],
  }));
}

4 Jebakan Umum yang Harus Dihindari

1. evaluationPeriods terlalu singkat

// ❌ Alarm terpicu oleh lonjakan sesaat
evaluationPeriods: 1,
threshold: 10,

// ✅ Alarm hanya terpicu setelah 3 periode berturut-turut (mengurangi false positive)
evaluationPeriods: 3,
threshold: 10,
datapointsToAlarm: 2,  // Alarm ketika ambang batas terlampaui 2 dari 3 periode

2. Mengabaikan biaya Log Insights

Log Insights mengenakan biaya berdasarkan jumlah data yang dipindai. Menjalankan query tanpa membatasi rentang waktu dapat menghasilkan tagihan tak terduga. Selalu tentukan --start-time dan --end-time.

3. Metrik kustom resolusi tinggi itu mahal

Metrik standar (60 detik) gratis, tetapi metrik resolusi tinggi (1 detik) biayanya sekitar 10 kali lebih mahal. Agregasi 1 menit biasanya sudah cukup untuk metrik bisnis.

4. Tidak mengatur periode retensi log Lambda

Defaultnya adalah “Tidak pernah kedaluwarsa”, yang membuat biaya penyimpanan terus bertambah. Selalu atur periode retensi pada grup log.

new logs.LogGroup(this, "AppLogGroup", {
  logGroupName: "/ecs/myapp",
  retention: logs.RetentionDays.ONE_MONTH,  // Penghapusan otomatis setelah 30 hari
});

Ringkasan

Tugas	Kontribusi Claude Code
Analisis log	Membaca log error dan mengusulkan hipotesis akar penyebab beserta langkah solusi
Query Log Insights	Menghasilkan query dari deskripsi tujuan analisis
Konfigurasi alarm	Menghasilkan kode CDK sekaligus dari deskripsi sistem
Dashboard	Menghasilkan definisi widget dari deskripsi informasi yang ingin ditampilkan
Investigasi insiden	Menjalankan perintah AWS CLI dan menganalisis hasilnya

“Monitoring-nya nanti saja diatur” — dan kemudian insiden datang dan tidak ada visibilitas sama sekali. Dengan Claude Code, Anda bisa memiliki alarm dan dashboard tingkat produksi siap dalam 30 menit.

Claude Code × AWS CloudWatch Panduan Lengkap | Analisis Log, Pengaturan Alarm & Otomatisasi Dashboard

Komponen Utama CloudWatch

Langkah 1: Delegasikan Analisis Pola Log ke Claude Code

Buat Query Log Insights Secara Otomatis

Langkah 2: Buat Konfigurasi Alarm Secara Otomatis

Langkah 3: Buat Dashboard Kustom Secara Otomatis

Langkah 4: Delegasikan Investigasi Insiden ke Claude Code

Langkah 5: Rancang Metrik Kustom Secara Otomatis

4 Jebakan Umum yang Harus Dihindari

Ringkasan

Artikel Terkait

Referensi

Tingkatkan alur kerja Claude Code kamu

PDF Gratis: Cheatsheet Claude Code dalam 5 Menit

Artikel Terkait

Claude Code × Amazon Bedrock Panduan Lengkap | Menjalankan Claude di Produksi dengan AWS

Claude Code × AWS CodePipeline/CodeBuild Panduan Lengkap | Bangun Pipeline CI/CD Secara Otomatis

Claude Code × AWS ECS/Fargate Panduan Lengkap | Otomatisasi Deployment Container

Produk Terkait

50 Template Prompt Claude Code Teruji