Zuplo
API Gateway

API Gateway Resilience and Fault Tolerance: Circuit Breakers, Retries, and Graceful Degradation

Nate TottenNate Totten
March 16, 2026
14 min read

Learn how to implement resilience patterns like circuit breakers, retries, timeouts, and graceful degradation at the API gateway layer to protect your services from cascading failures.

When a backend service goes down, the blast radius depends entirely on where you handle failure. If every microservice implements its own retry logic, timeout handling, and error responses, you end up with inconsistent behavior, duplicated effort, and gaps that let failures cascade. The API gateway is the natural place to centralize resilience — it’s the single control plane sitting between your clients and every upstream service.

This guide covers the core resilience patterns you should implement at the gateway layer, how they work together, and how to build them with a programmable gateway.

Why Resilience Belongs at the Gateway Layer

In a microservices architecture, the gateway processes every inbound request before it reaches your backends. That position makes it uniquely suited for resilience:

  • Single enforcement point — resilience policies apply consistently across all routes without duplicating logic in every service
  • Early failure detection — the gateway can detect unhealthy backends and stop forwarding traffic before clients experience long timeouts
  • Client isolation — one consumer’s traffic spike doesn’t take down services for everyone else
  • Centralized observability — all failures, retries, and circuit breaker trips are visible in one place

Without gateway-level resilience, a single slow database query can cascade into connection pool exhaustion across multiple services, timeout storms that amplify load, and eventually a full system outage. The patterns below prevent that cascade at its source.

Circuit Breaker Pattern

The circuit breaker is the most important resilience pattern for API gateways. It monitors requests to each backend and automatically stops forwarding traffic when a service is failing — preventing your gateway from wasting resources on requests that will never succeed.

How Circuit Breakers Work

A circuit breaker operates in three states:

  • Closed — requests flow normally. The breaker tracks failure rates (error count, timeout rate, or error percentage) within a rolling time window
  • Open — when failures exceed a configured threshold, the breaker “trips” and immediately returns an error response for all requests to that backend. No traffic reaches the failing service, giving it time to recover
  • Half-open — after a cooldown period, the breaker allows a small number of test requests through. If they succeed, the breaker returns to closed. If they fail, it goes back to open

This state machine prevents two critical problems. First, it stops retry storms — when hundreds of clients simultaneously retry against a failing service, each retry adds more load and makes recovery harder. Second, it gives failing services breathing room to recover without being hammered by requests that will time out anyway.

Implementing a Circuit Breaker in a Programmable Gateway

In a programmable gateway like Zuplo, you can implement circuit breaker logic directly in TypeScript using a custom inbound policy. Here’s a practical implementation using Zuplo’s ZoneCache to track circuit state across requests:

TypeScripttypescript
import {
  ZuploContext,
  ZuploRequest,
  ZoneCache,
  HttpProblems,
} from "@zuplo/runtime";

interface CircuitState {
  failures: number;
  lastFailure: number;
  state: "closed" | "open" | "half-open";
}

export default async function circuitBreakerPolicy(
  request: ZuploRequest,
  context: ZuploContext,
  options: {
    failureThreshold: number;
    cooldownSeconds: number;
    backendId: string;
  },
  policyName: string,
) {
  const cache = new ZoneCache<CircuitState>("circuit-breaker", context);
  const cacheKey = `cb:${options.backendId}`;
  const state = (await cache.get(cacheKey)) ?? {
    failures: 0,
    lastFailure: 0,
    state: "closed" as const,
  };

  // Check if circuit is open
  if (state.state === "open") {
    const elapsed = Date.now() - state.lastFailure;
    if (elapsed < options.cooldownSeconds * 1000) {
      context.log.warn(`Circuit open for ${options.backendId}`);
      return HttpProblems.serviceUnavailable(request, context, {
        detail: "Service temporarily unavailable. Please retry later.",
      });
    }
    // Transition to half-open
    state.state = "half-open";
  }

  // Allow request through
  return request;
}

This inbound policy handles the circuit state check — it blocks requests when the circuit is open and passes them through otherwise. You would pair it with a custom outbound policy that inspects the backend response, increments the failure counter on errors, and transitions the circuit to the open state when the threshold is exceeded.

The key advantage of implementing circuit breakers in code rather than declarative config is flexibility. You can customize failure detection per route — a payment service might trip after 3 failures, while a search service tolerates 10. You can also implement sophisticated health scoring that factors in response times, not just error codes.

Configuring Circuit Breaker Thresholds

Getting thresholds right requires understanding your backend’s failure modes:

  • Failure threshold — start with 5 failures in a 60-second window. Too low triggers false positives from transient errors. Too high lets real outages affect clients longer
  • Cooldown period — 30-60 seconds is typical. This should be long enough for the backend to recover from temporary issues
  • Half-open test count — allow 1-3 requests through to test recovery. If all succeed, close the circuit

Retry Policies and Exponential Backoff

Not every failure is permanent. Network blips, brief connection resets, and temporary overloads resolve themselves in seconds. Retry policies handle these transient failures by automatically resending requests — but only when done carefully.

Why Naive Retries Are Dangerous

A simple “retry 3 times immediately” policy can make outages worse. If a backend is struggling under load and 1,000 clients each retry 3 times immediately, you just tripled the load on an already overloaded service. This is a retry storm, and it’s one of the most common causes of extended outages in microservices architectures.

Exponential Backoff with Jitter

The solution is exponential backoff with jitter. Instead of retrying immediately, each attempt waits exponentially longer, and a random component prevents synchronized retries:

TypeScripttypescript
function calculateBackoff(
  attempt: number,
  baseDelayMs: number = 100,
  maxDelayMs: number = 10000,
): number {
  // Exponential backoff: 100ms, 200ms, 400ms, 800ms...
  const exponentialDelay = baseDelayMs * Math.pow(2, attempt);

  // Cap at maximum delay
  const cappedDelay = Math.min(exponentialDelay, maxDelayMs);

  // Add jitter: random value between 0 and the calculated delay
  const jitter = Math.random() * cappedDelay;

  return Math.floor(jitter);
}

Idempotency Considerations

Before adding retries to a route, you must determine whether the operation is safe to retry:

  • Safe to retry — GET requests, idempotent PUT requests, operations with idempotency keys
  • Dangerous to retry — non-idempotent POST requests (creating orders, processing payments) unless the backend supports idempotency keys
  • Conditionally safe — requests where you can check whether the original succeeded before retrying

Your gateway should only enable automatic retries on routes where the backend operation is idempotent. For non-idempotent operations, return the error to the client and let them decide whether to retry with an idempotency key.

Timeout Management

Timeouts are the most fundamental resilience mechanism. Without them, a hung backend silently consumes gateway connections until the entire system grinds to a halt.

Types of Timeouts

A complete timeout strategy involves multiple layers:

  • Connection timeout — how long the gateway waits to establish a TCP connection with the backend. Keep this short (5-10 seconds). If a backend isn’t accepting connections, waiting longer won’t help
  • Read timeout — how long the gateway waits for the backend to start responding after the connection is established. This varies by operation — a simple lookup might need 5 seconds, while a report generation endpoint might need 60
  • Total request timeout — the maximum end-to-end time for the entire request, including retries. This protects against scenarios where individual timeouts are acceptable but the cumulative time is too long

Per-Route Timeout Configuration

Different endpoints have fundamentally different performance characteristics. A health check should respond in milliseconds. A data export might legitimately take 30 seconds. Your gateway should configure timeouts per route, not globally.

In Zuplo, the platform limits define default connection timeouts between your gateway and origin servers — for example, a 19-second TCP connection timeout and a 180-second proxy read timeout on the managed edge. You can implement stricter per-route timeouts in a custom policy using AbortSignal:

TypeScripttypescript
import { ZuploContext, ZuploRequest, HttpProblems } from "@zuplo/runtime";

export default async function timeoutPolicy(
  request: ZuploRequest,
  context: ZuploContext,
  options: { timeoutMs: number },
  policyName: string,
) {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), options.timeoutMs);

  try {
    const response = await fetch(request.url, {
      method: request.method,
      headers: request.headers,
      body: request.body,
      signal: controller.signal,
    });
    clearTimeout(timeoutId);
    return response;
  } catch (error) {
    clearTimeout(timeoutId);
    if (error instanceof DOMException && error.name === "AbortError") {
      context.log.error(`Request timed out after ${options.timeoutMs}ms`);
      return HttpProblems.gatewayTimeout(request, context, {
        detail: `Backend did not respond within ${options.timeoutMs}ms`,
      });
    }
    throw error;
  }
}

Bulkhead Pattern

The bulkhead pattern borrows from ship design — bulkheads divide a ship’s hull into watertight compartments so that a breach in one section doesn’t flood the entire vessel. Applied to APIs, the pattern isolates resources per service or consumer to prevent cascade failures.

Why Bulkheads Matter

Without isolation, all routes through your gateway share the same connection pool. A single slow backend can exhaust every available connection, blocking requests to perfectly healthy services. Consider this scenario:

  1. Your payment service starts responding slowly (5-second responses instead of 200ms)
  2. Gateway connections to the payment service pile up, each waiting for a response
  3. The shared connection pool fills up
  4. Requests to your user service, product service, and search service all start failing — not because those services are unhealthy, but because there are no available connections

Implementing Bulkheads

Bulkhead implementation at the gateway layer typically means limiting concurrent requests per backend or per consumer:

  • Per-service bulkheads — limit the maximum concurrent requests to each backend service. If the payment service gets 50 concurrent connection slots, it can only consume those 50, even if it’s slow. The remaining connection capacity stays available for other services
  • Per-consumer bulkheads — limit concurrent requests per API key or user. One consumer’s burst of traffic can’t monopolize the gateway’s resources

In Zuplo, you can combine rate limiting per consumer with custom policies that track concurrent connections per backend to implement effective bulkhead isolation.

Rate Limiting as a Resilience Tool

Rate limiting is usually discussed as an access control mechanism, but it’s equally important as a resilience pattern. By capping the request rate per consumer, IP address, or API key, rate limiting prevents any single source of traffic from overwhelming your backends.

How Rate Limiting Protects Your Stack

Rate limiting provides several resilience benefits:

  • Backend protection — even if a client misbehaves, they can’t send more requests than your backend can handle
  • Consumer isolation — one consumer’s aggressive usage doesn’t degrade service for others
  • Predictable capacity — with rate limits in place, you can size your backend infrastructure based on known maximum throughput
  • DDoS mitigation — rate limits provide a first line of defense against volumetric attacks

Zuplo’s built-in rate limiting policy supports limiting by IP address, authenticated user, API key, or custom attributes. You can set different limits per route, per consumer tier, or per plan — and when limits are exceeded, clients receive a standard 429 Too Many Requests response with headers indicating when they can retry.

You can also implement dynamic rate limiting with a custom function that adjusts limits based on runtime conditions — for example, reducing allowed requests when your backend’s response time exceeds a threshold.

Graceful Degradation Strategies

When a backend fails, the worst response is no response. Graceful degradation means returning something useful — even if it’s not the complete or freshest data — instead of an error.

Serving Cached Responses

If a backend is unavailable, the gateway can serve a previously cached response. This works well for data that doesn’t change frequently — product catalogs, configuration endpoints, or content feeds. The client gets slightly stale data instead of a 503 error.

Zuplo provides multiple caching layers to support this pattern. The built-in caching policy automatically caches responses based on configurable TTLs, and the ZoneCache API gives you programmatic control for implementing custom cache-on-failure logic:

TypeScripttypescript
import {
  ZuploContext,
  ZuploRequest,
  ZoneCache,
  HttpProblems,
} from "@zuplo/runtime";

export default async function handler(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const cache = new ZoneCache<string>("fallback-cache", context);
  const cacheKey = new URL(request.url).pathname;

  try {
    const response = await fetch("https://api.backend.example.com" + cacheKey);

    if (response.ok) {
      const body = await response.text();
      // Cache successful responses for fallback use
      cache.put(cacheKey, body, 300).catch((err) => context.log.error(err));
      return new Response(body, { headers: response.headers });
    }

    throw new Error(`Backend returned ${response.status}`);
  } catch (error) {
    // Try to serve from cache on failure
    const cached = await cache.get(cacheKey);
    if (cached) {
      context.log.warn("Serving cached fallback response");
      return new Response(cached, {
        headers: { "X-Fallback": "true", "Content-Type": "application/json" },
      });
    }

    return HttpProblems.serviceUnavailable(request, context, {
      detail: "Service temporarily unavailable",
    });
  }
}

Fallback Endpoints

For critical paths, configure fallback backends that the gateway routes to when the primary fails. This is different from load balancing — the fallback might be a simplified service that handles the most common request patterns, or a static endpoint returning default data.

Reduced Functionality Mode

Not every feature of your API is equally critical. When backends are under stress, the gateway can disable non-essential features (recommendations, related items, analytics tracking) while keeping core functionality (authentication, core data, transactions) fully operational.

Health Checks and Automatic Failover

Resilience patterns like circuit breakers are reactive — they detect failures after they affect requests. Health checks are proactive — they detect problems before clients are impacted.

Active vs. Passive Health Checks

  • Active health checks — the gateway periodically sends synthetic requests to each backend (e.g., hitting a /health endpoint every 10 seconds). If a backend fails consecutive health checks, the gateway stops routing traffic to it
  • Passive health checks — the gateway monitors the responses from real traffic. A spike in 5xx errors or timeouts triggers the backend to be marked as unhealthy

The best approach uses both. Active checks catch backends that are down but not yet receiving traffic. Passive checks catch degradation that a simple health endpoint might miss — like a service that returns 200 on its health endpoint but times out on real queries.

Edge-Native Failover

Zuplo’s edge-native architecture provides infrastructure-level failover that self-hosted gateways can’t match. Your API gateway runs across 300+ edge locations worldwide, and if one location experiences issues, traffic automatically routes to the nearest healthy location. This happens at the network layer — no configuration, no custom code, no manual intervention.

For teams with stricter requirements, Zuplo’s managed dedicated architecture supports multi-region deployments with global load balancing. Traffic routes to the closest healthy region, and if an entire region experiences an outage, a global load balancer handles failover automatically.

Combining Patterns: A Layered Resilience Strategy

These patterns work best when layered together. Here’s how they compose for a typical API:

  1. Rate limiting is the outermost layer — it caps inbound traffic before any processing happens, protecting everything downstream
  2. Timeouts ensure no single request consumes resources indefinitely
  3. Retries with backoff handle transient failures transparently, so clients don’t need to implement their own retry logic
  4. Circuit breakers detect sustained failures and fast-fail requests, preventing retry storms and giving backends time to recover
  5. Bulkheads isolate failures to specific services or consumers, preventing blast radius from spreading
  6. Graceful degradation kicks in when all else fails, serving cached responses or reduced functionality instead of errors
  7. Health checks continuously monitor backend health and remove unhealthy instances from the routing pool before clients are affected

The order matters. Retries should happen inside circuit breakers — if the circuit is open, there’s no point retrying. Rate limiting should be enforced before retries — you don’t want retries consuming rate limit quota.

How Major API Gateways Handle Resilience

Different gateways take different approaches to resilience. Here’s how the major options compare.

Kong

Kong provides circuit breakers and health checks through its built-in upstream health checking system. Active and passive health checks are configurable per-upstream, and circuit breaking is tightly coupled to the health check mechanism. Retry and timeout configuration is available declaratively. Implementing custom resilience logic requires writing plugins in Lua (Kong’s primary plugin language), Go, or via gRPC — Lua being a less common language for most API development teams.

AWS API Gateway

AWS API Gateway offers integration timeouts (29 seconds by default for REST APIs, with the option to request an increase; 30 seconds for HTTP APIs) and automatic retry logic for some integration types. There is no built-in circuit breaker — teams typically implement this pattern using a combination of AWS Lambda, Step Functions, and CloudWatch alarms. Complex resilience patterns require orchestrating multiple AWS services together.

Azure API Management

Azure APIM provides declarative retry policies, circuit breaker behavior through its backend entity configuration, and timeout settings. These are configured via XML policy expressions. Custom logic is possible but constrained to Azure’s policy expression syntax, which is less flexible than a general-purpose programming language.

Tyk

Tyk offers built-in circuit breakers with configurable thresholds and return-to-service checks. Built-in rate limiting, enforced timeouts, and retry logic are available through declarative configuration. Custom plugins are written in Go, JavaScript, Python, Lua, or any gRPC-supported language — offering more language choices than Kong but still requiring gateway-specific plugin development.

Implementing Resilience with Zuplo

Zuplo’s approach to resilience differs from traditional gateways in a fundamental way: instead of offering a fixed menu of declarative resilience options, Zuplo gives you a programmable handler pipeline where you write resilience logic in TypeScript.

This matters because resilience requirements are rarely one-size-fits-all. A payment endpoint needs different retry behavior than a search endpoint. A consumer on a free tier should hit circuit breakers at different thresholds than an enterprise customer. A programmable gateway lets you express these nuances in code rather than fighting with declarative configuration limits.

Key Zuplo capabilities for building resilient APIs:

  • Custom inbound and outbound policies — implement circuit breakers, retry logic, timeout enforcement, and fallback behavior in TypeScript using custom policies
  • Edge-native architecture — 300+ global edge locations provide inherent redundancy and automatic failover at the network layer
  • Built-in rate limiting — the rate limiting policy supports per-consumer, per-route, and custom bucket-based limiting out of the box
  • ZoneCache for fallback dataZoneCache provides low-latency caching for implementing fallback responses and circuit breaker state
  • RFC 7807 error responses — the HttpProblems helper generates standardized error responses, so clients can programmatically handle degradation scenarios
  • Runtime hooksruntime extensions let you add global error handlers and request lifecycle hooks

The trade-off between Zuplo and declarative gateways is clear: declarative gateways get you basic resilience patterns faster, while a programmable gateway gives you the flexibility to implement sophisticated, context-aware resilience logic tailored to your specific requirements.

Getting Started

Building a resilient API gateway isn’t an all-or-nothing effort. Start with the patterns that address your most common failure modes:

  1. Add rate limiting first — it’s the simplest pattern with the biggest impact. Zuplo’s rate limiting policy can be added to any route in minutes
  2. Set appropriate timeouts — review your route configurations and ensure every backend connection has a reasonable timeout
  3. Implement circuit breakers for critical backends — start with your most failure-prone or most critical downstream services
  4. Add cached fallbacks for read-heavy endpoints — use ZoneCache to serve stale data when backends fail
  5. Monitor and iterate — use your gateway’s observability data to identify which routes need stronger resilience policies

Ready to add resilience to your API? Sign up for Zuplo and start building with programmable policies in minutes, or explore the Zuplo documentation to learn more about custom policies, rate limiting, and edge-native deployment.