---
title: "How to implement a circuit breaker at the API gateway"
description: "When a backend fails, retry storms can make recovery even harder. Learn how to implement the circuit breaker pattern as custom TypeScript policies in Zuplo to automatically stop traffic to failing services, with per-route thresholds and RFC 7807 error responses."
canonicalUrl: "https://zuplo.com/blog/2026/03/17/how-to-implement-circuit-breaker-at-the-api-gateway"
pageType: "blog"
date: "2026-03-17"
authors: "martyn"
tags: "API Gateway"
image: "https://zuplo.com/og?text=How%20to%20Implement%20a%20Circuit%20Breaker%20at%20the%20API%20Gateway"
---
Imagine this scenario: Your backend goes down. Every client retries
simultaneously. The retry storm adds more load, making recovery harder.
Meanwhile your gateway is burning resources on requests that will never succeed.

Sounds like a bad day, right? Fortunately, there's an approach that you can use
to prevent this at the gateway level.

The **circuit breaker pattern** monitors backend health and automatically stops
forwarding traffic when a service is failing, giving it time to recover without
being hammered by doomed requests.

<CalloutAudience
  variant="useIf"
  items={[
    `Your API proxies to backends that occasionally fail or slow down`,
    `You want to prevent retry storms from overwhelming recovering services`,
    `You need per-route failure thresholds (stricter for payments, relaxed for search)`,
    `You want RFC 7807 error responses when the circuit trips`,
  ]}
/>

## The three states

A circuit breaker is a state machine with three states:

1. **Closed**: Requests flow normally. The breaker tracks failures in a rolling
   window.
2. **Open**: Failures exceeded the threshold. All requests immediately get a 503
   response. No traffic reaches the backend.
3. **Half-open**: After a cool down period, the breaker allows a test request
   through. If it succeeds, the circuit closes. If it fails, it opens again.

For a deeper look at the pattern and how it fits into a broader resilience
strategy (retries, timeouts, bulkheads), see the
[API Gateway Resilience and Fault Tolerance](https://zuplo.com/learning-center/api-gateway-resilience-fault-tolerance)
article in our learning center.

## The implementation

In a programmable gateway, you can implement this as two custom policies that
share state: an inbound policy that checks the circuit before each request, and
an outbound policy that tracks failures from backend responses.

The shared state lives in
[ZoneCache](https://zuplo.com/docs/programmable-api/zone-cache), Zuplo's
low-latency cache within each deployment zone.

### Inbound policy: check the circuit

This policy runs before the request reaches your backend. If the circuit is
open, it short-circuits and returns a 503 immediately.

```ts
// modules/circuit-breaker-inbound.ts
import {
  ZuploContext,
  ZuploRequest,
  ZoneCache,
  HttpProblems,
} from "@zuplo/runtime";

interface CircuitState {
  failures: number;
  lastFailure: number;
  state: "closed" | "open" | "half-open";
}

interface CircuitBreakerOptions {
  failureThreshold: number;
  cooldownSeconds: number;
  backendId: string;
  stateTtlSeconds?: number;
}

const DEFAULT_STATE: CircuitState = {
  failures: 0,
  lastFailure: 0,
  state: "closed",
};

export default async function circuitBreakerInbound(
  request: ZuploRequest,
  context: ZuploContext,
  options: CircuitBreakerOptions,
  policyName: string,
) {
  const cache = new ZoneCache<CircuitState>("circuit-breaker", context);
  const cacheKey = `cb:${options.backendId}`;

  const state = (await cache.get(cacheKey)) ?? { ...DEFAULT_STATE };

  if (state.state === "open") {
    const elapsed = Date.now() - state.lastFailure;

    if (elapsed < options.cooldownSeconds * 1000) {
      // Still within cooldown, reject immediately
      context.log.warn(`Circuit open for backend '${options.backendId}'.`);

      return HttpProblems.serviceUnavailable(request, context, {
        detail: `Service temporarily unavailable. Retry after ${options.cooldownSeconds} seconds.`,
      });
    }

    // Cooldown expired, transition to half-open
    state.state = "half-open";
    await cache.put(cacheKey, state, options.stateTtlSeconds ?? 300);
  }

  return request;
}
```

When the circuit is open and the cooldown hasn't expired, the client gets a
standard [RFC 7807](https://datatracker.ietf.org/doc/html/rfc7807) problem
response with a 503 status. No request ever reaches the backend. The response
looks like this:

```json
{
  "type": "https://httpproblems.com/http-status/503",
  "title": "Service Unavailable",
  "status": 503,
  "detail": "Service temporarily unavailable. Retry after 30 seconds.",
  "instance": "/v1/payments",
  "trace": {
    "timestamp": "2025-03-17T10:42:03.128Z",
    "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}
```

This is a standard Zuplo problem response. Your clients can check for a 503
status and implement their own backoff logic on their end.

Once the cooldown period passes, the policy transitions to half-open and lets
the next request through as a test.

### Outbound policy: track failures

This policy inspects backend responses and updates the circuit state. On failure
it increments the counter. When the threshold is crossed, it opens the circuit.

```ts
// modules/circuit-breaker-outbound.ts
import { ZuploContext, ZuploRequest, ZoneCache } from "@zuplo/runtime";

interface CircuitState {
  failures: number;
  lastFailure: number;
  state: "closed" | "open" | "half-open";
}

interface CircuitBreakerOptions {
  failureThreshold: number;
  cooldownSeconds: number;
  backendId: string;
  stateTtlSeconds?: number;
}

const DEFAULT_STATE: CircuitState = {
  failures: 0,
  lastFailure: 0,
  state: "closed",
};

export default async function circuitBreakerOutbound(
  response: Response,
  request: ZuploRequest,
  context: ZuploContext,
  options: CircuitBreakerOptions,
  policyName: string,
) {
  const cache = new ZoneCache<CircuitState>("circuit-breaker", context);
  const cacheKey = `cb:${options.backendId}`;

  const state = (await cache.get(cacheKey)) ?? { ...DEFAULT_STATE };

  if (response.ok) {
    // Success during half-open: close the circuit
    if (state.state === "half-open") {
      context.log.info(`Circuit closing for backend '${options.backendId}'.`);
      state.state = "closed";
      state.failures = 0;
      state.lastFailure = 0;
      await cache.put(cacheKey, state, options.stateTtlSeconds ?? 300);
    }

    return response;
  }

  // Failure: increment counter
  state.failures += 1;
  state.lastFailure = Date.now();

  context.log.warn(
    `Backend '${options.backendId}' returned ${response.status}. ` +
      `Failures: ${state.failures}/${options.failureThreshold}.`,
  );

  if (state.failures >= options.failureThreshold) {
    context.log.error(`Circuit opening for backend '${options.backendId}'.`);
    state.state = "open";
  }

  await cache.put(cacheKey, state, options.stateTtlSeconds ?? 300);

  return response;
}
```

The outbound policy uses `response.ok` to classify success vs. failure. This
covers any 2xx response as success and everything else as a failure. You can
customize this. For example, you might only count 5xx responses as failures and
treat 4xx client errors as normal:

```ts
// Only count server errors as failures
const isFailure = response.status >= 500;
```

### Wiring it up

Add both policies to your `policies.json` and attach them to the route:

```json
// config/policies.json
{
  "policies": [
    {
      "name": "circuit-breaker-inbound",
      "policyType": "custom-code-inbound",
      "handler": {
        "export": "default",
        "module": "$import(./modules/circuit-breaker-inbound)",
        "options": {
          "failureThreshold": 5,
          "cooldownSeconds": 30,
          "backendId": "my-backend-api"
        }
      }
    },
    {
      "name": "circuit-breaker-outbound",
      "policyType": "custom-code-outbound",
      "handler": {
        "export": "default",
        "module": "$import(./modules/circuit-breaker-outbound)",
        "options": {
          "failureThreshold": 5,
          "cooldownSeconds": 30,
          "backendId": "my-backend-api"
        }
      }
    }
  ]
}
```

Then reference both policies on any route that should be protected:

```json
"policies": {
  "inbound": ["circuit-breaker-inbound"],
  "outbound": ["circuit-breaker-outbound"]
}
```

<CalloutSample
  title="Circuit Breaker Example"
  description="A demo example that implements the circuit breaker pattern as inbound and outbound policies. Deploy directly to your Zuplo account or run locally."
  deployUrl="https://zuplo.com/examples/circuit-breaker"
  repoUrl="https://github.com/zuplo/zuplo/tree/main/examples/circuit-breaker"
  localCommand="npx create-zuplo-api --example circuit-breaker"
/>

The `backendId` option is the key to per-route customization. Set a different
`backendId` for each backend, and each one gets its own independent circuit
state. A payment service can trip after 3 failures while a search endpoint
tolerates 10.

If you're also using other policies like rate limiting or authentication, order
matters. The circuit breaker inbound policy should run after authentication (no
point checking the circuit for unauthenticated requests) but before rate
limiting (a tripped circuit should return 503 before consuming a rate limit
token).

## Choosing thresholds

Getting thresholds right matters. Too sensitive and you'll trip on transient
errors. Too generous and real outages affect clients for too long.

**Failure threshold**: Start with 5 failures. For critical payment flows, drop
it to 2 or 3. For search or non-critical reads, 10 is reasonable.

**Cooldown period**: 30 seconds is a good starting point. Long enough for most
transient issues to resolve, short enough that you aren't blocking traffic for
ages if the backend recovered quickly.

**Cache TTL** (`stateTtlSeconds`): This is a safety net. If no requests come in
for this period, the state expires and resets to closed. The default of 300
seconds (5 minutes) works for most cases. Set it higher for low-traffic routes.

## Testing the circuit breaker

You can verify the circuit breaker works without waiting for a real outage. The
quickest approach is to create a mock backend using
[Mockbin](https://mockbin.io) that returns a 500 error. Create a new bin and
configure the response like this:

- **Status**: `500`
- **Headers**: `Content-Type: application/json`
- **Body**:

```json
{
  "error": "Internal Server Error",
  "message": "Simulated backend failure"
}
```

Copy the bin URL and use it as your route's backend URL. Every request to that
route will now get a 500 response, which the outbound policy counts as a
failure.

For more control, you can swap your route handler for a simple one that fails on
demand via a query parameter:

```ts
// modules/test-handler.ts
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

export default async function (request: ZuploRequest, context: ZuploContext) {
  const fail = request.query.fail === "true";

  if (fail) {
    return new Response("Internal Server Error", { status: 500 });
  }

  return new Response(JSON.stringify({ status: "ok" }), {
    headers: { "content-type": "application/json" },
  });
}
```

Either way, set your failure threshold to 3 and cooldown to 10 seconds so you
can cycle through the states quickly. Then:

1. Send a few normal requests to confirm they pass through (circuit closed).
2. Send 3 failing requests (via the Mockbin route or `?fail=true`) to trip the
   circuit.
3. Send another request and confirm you get a 503 with no backend call.
4. Wait 10 seconds, send a successful request, and confirm the circuit closes.

Check your Zuplo logs for the circuit state transitions. You should see the
`warn` and `error` messages from both policies as the state changes.

## Why implement this in code?

Config-based gateways that support circuit breakers typically give you a few
knobs: threshold, cooldown, maybe a status code filter. That works until it
doesn't.

With a programmable gateway, the circuit breaker logic is just TypeScript. You
can:

- Factor in response latency, not just error codes
- Use different failure detection per route without duplicating config
- Send alerts (via a webhook in the outbound policy) when a circuit opens
- Log structured circuit state changes to your observability stack
- Implement gradual recovery in half-open state instead of a single test request

The tradeoff is that you maintain the code. But it's ~60 lines per policy and
the logic is straightforward.

<CalloutDoc
  title="Custom Code Policies"
  description="Full reference for writing custom inbound and outbound policies in TypeScript."
  href="https://zuplo.com/docs/policies/custom-code-inbound"
/>

## Deploy a circuit breaker in seconds with GitOps

A circuit breaker adds overhead you might not need when your backends are
healthy. The good news: you don't have to treat this as permanent
infrastructure.

Because Zuplo projects are Git repos, adding a circuit breaker to a route is a
code change. When a backend starts misbehaving, you can:

1. Add the two policy files to your project.
2. Reference them on the affected route in policies.json.
3. Push to your branch. Zuplo deploys in seconds.

Once your production gateway rebuilds, the circuit breaker is live.

Once the backend is stable again, you can remove the policies from the route and
push again. You're back to zero overhead.

This works well as an incident response tool. Keep the policy modules in your
repo but don't attach them to any routes. When something goes wrong, wiring them
up is a one-line change to your route config. If you use environment-based
routing, you can even test the circuit breaker on a preview branch before
promoting it to production.

## Going further

This implementation covers the core pattern. A few things you might add for
production use:

**Rolling window**: Instead of a simple counter, track failures within a time
window (e.g., 5 failures in the last 60 seconds). Reset the counter when the
window rolls over.

**Gradual half-open recovery**: Allow 3 test requests through in half-open state
instead of one. Close the circuit only if all 3 succeed.

**Alerting**: Fire a webhook or write to a queue when the circuit opens. Your
on-call team should know when a backend is failing hard enough to trip the
breaker.

**Combine with retries and timeouts**: Circuit breakers work best alongside
other resilience patterns. Add a timeout to prevent slow backends from holding
connections, and a retry policy for transient errors that happen while the
circuit is closed.