Going to production

Your API integration works in dev. It passes tests. But production is a different world — one where costs compound, errors happen at 3 AM, and a single bug can drain your API budget in minutes. This lesson covers the operational layer that makes the difference.

Structured logging

Log every API call with enough context to debug later:

async function callClaude(params: MessageCreateParams) {
  const start = Date.now();
  try {
    const response = await client.messages.create(params);
    logger.info("claude_api_call", {
      model: params.model,
      input_tokens: response.usage.input_tokens,
      output_tokens: response.usage.output_tokens,
      cache_read: response.usage.cache_read_input_tokens ?? 0,
      latency_ms: Date.now() - start,
      stop_reason: response.stop_reason,
    });
    return response;
  } catch (error) {
    logger.error("claude_api_error", {
      model: params.model,
      error: error.message,
      latency_ms: Date.now() - start,
    });
    throw error;
  }
}

From these logs you can answer: how much did today cost, what's our average latency, and which endpoint is using the most tokens.

Cost monitoring

Track daily spend and set alerts:

// Simple daily accumulator
let dailyTokens = { input: 0, output: 0 };

function trackUsage(response: Message) {
  dailyTokens.input += response.usage.input_tokens;
  dailyTokens.output += response.usage.output_tokens;

  const dailyCostUsd =
    (dailyTokens.input * 5 + dailyTokens.output * 25) / 1_000_000;

  if (dailyCostUsd > DAILY_BUDGET_USD) {
    disableAIFeatures();
    alertOncall("AI budget exceeded: $" + dailyCostUsd.toFixed(2));
  }
}

Task budgets (new in Opus 4.7, public beta) let you cap token spend on autonomous agents directly through the API — no custom tracking needed. Set a budget, and Claude prioritizes work within it.

The kill switch

A feature flag that disables AI features without deploying code:

// Read from environment, database, or feature flag service
const AI_ENABLED = process.env.AI_KILL_SWITCH !== "off";

async function summarize(text: string): Promise<string> {
  if (!AI_ENABLED) {
    return "AI features are temporarily unavailable.";
  }
  return callClaude({ /* ... */ });
}

When something goes wrong at 3 AM, flip one flag and the AI features go dark while everything else keeps working. No deploy, no rollback, no waiting.

Model fallback

Don't put all your eggs in one model. If Opus 4.7 is overloaded or your budget is tight, fall back to Sonnet:

const MODELS = ["claude-opus-4-7", "claude-sonnet-4-6"] as const;

async function callWithFallback(messages: Message[]) {
  for (const model of MODELS) {
    try {
      return await client.messages.create({
        model,
        max_tokens: 1024,
        messages,
      });
    } catch (error) {
      if (model === MODELS[MODELS.length - 1]) throw error;
      logger.warn("model_fallback", { from: model, error: error.message });
    }
  }
}

Rate limits

Claude's API has per-minute and per-day rate limits based on your usage tier. Key practices:

Build retry logic with exponential backoff (the SDK does this by default)
Spread batch-like workloads over time rather than bursting
Monitor 429 Too Many Requests responses — they indicate you're hitting your ceiling

The production checklist

Before your first real user touches the AI feature:

Structured logging on every API call (model, tokens, latency, errors)
Daily cost tracking with a hard budget alert
Kill switch accessible without a deploy
Model fallback from expensive to cheap
Rate limit handling with exponential backoff
Input validation before sending to the API (prevent injection, limit size)
Output validation before showing to users (schema check, content filter)
Error messages that don't expose API internals

Course complete

You've covered the entire Claude API — from your first call to a production-grade integration with vision, caching, batches, and operational guardrails. This is everything you need to build AI-powered features that work reliably at scale.

For the next step in your learning path, take Claude Code 101 to learn how to use Claude as an agent-powered coding assistant.

تكاملك مع الواجهة يعمل في التّطوير. يجتاز الاختبارات. لكنّ الإنتاج عالم مختلف — حيث التّكاليف تتراكم، والأخطاء تحدث في الثّالثة فجرًا، وعلّة واحدة يمكنها استنزاف ميزانيّة واجهتك في دقائق.

السّجلّات المنظّمة

سجّل كلّ استدعاء API بسياق كافٍ للتّشخيص لاحقًا: النّموذج، رموز المدخل والمخرج، رموز الذّاكرة المقروءة، الكمون، سبب التّوقّف. من هذه السّجلّات تستطيع الإجابة: كم كلّف اليوم، ما متوسّط الكمون، وأيّ نقطة نهاية تستهلك أكثر الرّموز.

مراقبة التّكلفة

تتبّع الإنفاق اليومي واضبط تنبيهات. ميزانيّات المهامّ (جديدة في Opus 4.7، بيتا عامّة) تتيح تحديد سقف إنفاق الرّموز على الوكلاء المستقلّين مباشرة عبر الواجهة — بلا تتبّع مخصّص. ضع ميزانيّة ويرتّب Claude الأولويّات ضمنها.

مفتاح الإيقاف

علم ميزة يعطّل ميزات الذّكاء دون نشر كود. حين يحدث خطأ في الثّالثة فجرًا، اقلب علمًا واحدًا وتنطفئ ميزات الذّكاء بينما يبقى كلّ شيء آخر يعمل. بلا نشر، بلا تراجع، بلا انتظار.

بدائل النّماذج

لا تضع كلّ بيضك في نموذج واحد. إذا كان Opus 4.7 مثقلًا أو ميزانيّتك ضيّقة، تراجع إلى Sonnet. تراجع على أخطاء الخادم فقط (5xx)، لا على أخطاء العميل (4xx).

حدود المعدّل

واجهة Claude لها حدود معدّل بالدّقيقة وباليوم حسب فئة استعمالك. ابنِ منطق إعادة بتراجع أسّي (SDK يفعل هذا افتراضيًّا)، وزّع الأعمال الشّبيهة بالدّفعات على الوقت بدل الاندفاع، وراقب استجابات 429.

قائمة فحص الإنتاج

قبل أن يلمس أوّل مستخدم حقيقي ميزة الذّكاء: سجلّات منظّمة على كلّ استدعاء، تتبّع تكلفة يومي مع تنبيه ميزانيّة صارم، مفتاح إيقاف متاح بلا نشر، بديل نموذجي من الغالي للرّخيص، معالجة حدود المعدّل بتراجع أسّي، تحقّق من المدخلات قبل الإرسال، تحقّق من المخرجات قبل العرض، ورسائل خطأ لا تكشف تفاصيل الواجهة الدّاخليّة.

الدّورة مكتملة

غطّيت كامل واجهة Claude — من أوّل استدعاء إلى تكامل إنتاجي بالرّؤية والتّخزين والدّفعات والحواجز التّشغيليّة. هذا كلّ ما تحتاجه لبناء ميزات مدعومة بالذّكاء تعمل بموثوقيّة على نطاق واسع.

للخطوة التّالية في مسار تعلّمك، خذ Claude Code 101 لتتعلّم استعمال Claude كمساعد كتابة كود مدعوم بالوكلاء.

Structured logging

Log every API call with enough context to debug later:

async function callClaude(params: MessageCreateParams) {
  const start = Date.now();
  try {
    const response = await client.messages.create(params);
    logger.info("claude_api_call", {
      model: params.model,
      input_tokens: response.usage.input_tokens,
      output_tokens: response.usage.output_tokens,
      cache_read: response.usage.cache_read_input_tokens ?? 0,
      latency_ms: Date.now() - start,
      stop_reason: response.stop_reason,
    });
    return response;
  } catch (error) {
    logger.error("claude_api_error", {
      model: params.model,
      error: error.message,
      latency_ms: Date.now() - start,
    });
    throw error;
  }
}

From these logs you can answer: how much did today cost, what's our average latency, and which endpoint is using the most tokens.

Cost monitoring

Track daily spend and set alerts:

// Simple daily accumulator
let dailyTokens = { input: 0, output: 0 };

function trackUsage(response: Message) {
  dailyTokens.input += response.usage.input_tokens;
  dailyTokens.output += response.usage.output_tokens;

  const dailyCostUsd =
    (dailyTokens.input * 5 + dailyTokens.output * 25) / 1_000_000;

  if (dailyCostUsd > DAILY_BUDGET_USD) {
    disableAIFeatures();
    alertOncall("AI budget exceeded: $" + dailyCostUsd.toFixed(2));
  }
}

The kill switch

A feature flag that disables AI features without deploying code:

// Read from environment, database, or feature flag service
const AI_ENABLED = process.env.AI_KILL_SWITCH !== "off";

async function summarize(text: string): Promise<string> {
  if (!AI_ENABLED) {
    return "AI features are temporarily unavailable.";
  }
  return callClaude({ /* ... */ });
}

When something goes wrong at 3 AM, flip one flag and the AI features go dark while everything else keeps working. No deploy, no rollback, no waiting.

Model fallback

Don't put all your eggs in one model. If Opus 4.7 is overloaded or your budget is tight, fall back to Sonnet:

const MODELS = ["claude-opus-4-7", "claude-sonnet-4-6"] as const;

async function callWithFallback(messages: Message[]) {
  for (const model of MODELS) {
    try {
      return await client.messages.create({
        model,
        max_tokens: 1024,
        messages,
      });
    } catch (error) {
      if (model === MODELS[MODELS.length - 1]) throw error;
      logger.warn("model_fallback", { from: model, error: error.message });
    }
  }
}

Rate limits

Claude's API has per-minute and per-day rate limits based on your usage tier. Key practices:

Build retry logic with exponential backoff (the SDK does this by default)
Spread batch-like workloads over time rather than bursting
Monitor 429 Too Many Requests responses — they indicate you're hitting your ceiling

The production checklist

Before your first real user touches the AI feature:

Structured logging on every API call (model, tokens, latency, errors)
Daily cost tracking with a hard budget alert
Kill switch accessible without a deploy
Model fallback from expensive to cheap
Rate limit handling with exponential backoff
Input validation before sending to the API (prevent injection, limit size)
Output validation before showing to users (schema check, content filter)
Error messages that don't expose API internals

Going to production

What you'll learn

Structured logging

Cost monitoring

The kill switch

Model fallback

Rate limits

The production checklist

Course complete

السّجلّات المنظّمة

مراقبة التّكلفة

مفتاح الإيقاف

بدائل النّماذج

حدود المعدّل

قائمة فحص الإنتاج

الدّورة مكتملة

Try it yourself

Reflect

Going to production

What you'll learn

Structured logging

Cost monitoring

The kill switch

Model fallback

Rate limits

The production checklist

Course complete

السّجلّات المنظّمة

مراقبة التّكلفة

مفتاح الإيقاف

بدائل النّماذج

حدود المعدّل

قائمة فحص الإنتاج

الدّورة مكتملة

Try it yourself

Reflect