---
title: "API Gateway Resilience and Fault Tolerance: Circuit Breakers, Retries, and Graceful Degradation"
description: "Learn how to implement resilience patterns like circuit breakers, retries, timeouts, and graceful degradation at the API gateway layer to protect your services from cascading failures."
canonicalUrl: "https://zuplo.com/learning-center/api-gateway-resilience-fault-tolerance"
pageType: "learning-center"
authors: "nate"
tags: "API Gateway, API Best Practices"
image: "https://zuplo.com/og?text=API%20Gateway%20Resilience%20and%20Fault%20Tolerance"
---
When a backend service goes down, the blast radius depends entirely on where you
handle failure. If every microservice implements its own retry logic, timeout
handling, and error responses, you end up with inconsistent behavior, duplicated
effort, and gaps that let failures cascade. The API gateway is the natural place
to centralize resilience — it's the single control plane sitting between your
clients and every upstream service.

This guide covers the core resilience patterns you should implement at the
gateway layer, how they work together, and how to build them with a programmable
gateway.

## Why Resilience Belongs at the Gateway Layer

In a microservices architecture, the gateway processes every inbound request
before it reaches your backends. That position makes it uniquely suited for
resilience:

- **Single enforcement point** — resilience policies apply consistently across
  all routes without duplicating logic in every service
- **Early failure detection** — the gateway can detect unhealthy backends and
  stop forwarding traffic before clients experience long timeouts
- **Client isolation** — one consumer's traffic spike doesn't take down services
  for everyone else
- **Centralized observability** — all failures, retries, and circuit breaker
  trips are visible in one place

Without gateway-level resilience, a single slow database query can cascade into
connection pool exhaustion across multiple services, timeout storms that amplify
load, and eventually a full system outage. The patterns below prevent that
cascade at its source.

## Circuit Breaker Pattern

The circuit breaker is the most important resilience pattern for API gateways.
It monitors requests to each backend and automatically stops forwarding traffic
when a service is failing — preventing your gateway from wasting resources on
requests that will never succeed.

### How Circuit Breakers Work

A circuit breaker operates in three states:

- **Closed** — requests flow normally. The breaker tracks failure rates (error
  count, timeout rate, or error percentage) within a rolling time window
- **Open** — when failures exceed a configured threshold, the breaker "trips"
  and immediately returns an error response for all requests to that backend. No
  traffic reaches the failing service, giving it time to recover
- **Half-open** — after a cooldown period, the breaker allows a small number of
  test requests through. If they succeed, the breaker returns to closed. If they
  fail, it goes back to open

This state machine prevents two critical problems. First, it stops retry storms
— when hundreds of clients simultaneously retry against a failing service, each
retry adds more load and makes recovery harder. Second, it gives failing
services breathing room to recover without being hammered by requests that will
time out anyway.

### Implementing a Circuit Breaker in a Programmable Gateway

In a programmable gateway like Zuplo, you can implement circuit breaker logic
directly in TypeScript using a
[custom inbound policy](/docs/policies/custom-code-inbound). Here's a practical
implementation using Zuplo's [ZoneCache](/docs/programmable-api/zone-cache) to
track circuit state across requests:

```typescript
import {
  ZuploContext,
  ZuploRequest,
  ZoneCache,
  HttpProblems,
} from "@zuplo/runtime";

interface CircuitState {
  failures: number;
  lastFailure: number;
  state: "closed" | "open" | "half-open";
}

export default async function circuitBreakerPolicy(
  request: ZuploRequest,
  context: ZuploContext,
  options: {
    failureThreshold: number;
    cooldownSeconds: number;
    backendId: string;
  },
  policyName: string,
) {
  const cache = new ZoneCache<CircuitState>("circuit-breaker", context);
  const cacheKey = `cb:${options.backendId}`;
  const state = (await cache.get(cacheKey)) ?? {
    failures: 0,
    lastFailure: 0,
    state: "closed" as const,
  };

  // Check if circuit is open
  if (state.state === "open") {
    const elapsed = Date.now() - state.lastFailure;
    if (elapsed < options.cooldownSeconds * 1000) {
      context.log.warn(`Circuit open for ${options.backendId}`);
      return HttpProblems.serviceUnavailable(request, context, {
        detail: "Service temporarily unavailable. Please retry later.",
      });
    }
    // Transition to half-open
    state.state = "half-open";
  }

  // Allow request through
  return request;
}
```

This inbound policy handles the circuit state check — it blocks requests when
the circuit is open and passes them through otherwise. You would pair it with a
[custom outbound policy](/docs/policies/custom-code-outbound) that inspects the
backend response, increments the failure counter on errors, and transitions the
circuit to the open state when the threshold is exceeded.

The key advantage of implementing circuit breakers in code rather than
declarative config is flexibility. You can customize failure detection per route
— a payment service might trip after 3 failures, while a search service
tolerates 10. You can also implement sophisticated health scoring that factors
in response times, not just error codes.

### Configuring Circuit Breaker Thresholds

Getting thresholds right requires understanding your backend's failure modes:

- **Failure threshold** — start with 5 failures in a 60-second window. Too low
  triggers false positives from transient errors. Too high lets real outages
  affect clients longer
- **Cooldown period** — 30-60 seconds is typical. This should be long enough for
  the backend to recover from temporary issues
- **Half-open test count** — allow 1-3 requests through to test recovery. If all
  succeed, close the circuit

## Retry Policies and Exponential Backoff

Not every failure is permanent. Network blips, brief connection resets, and
temporary overloads resolve themselves in seconds. Retry policies handle these
transient failures by automatically resending requests — but only when done
carefully.

### Why Naive Retries Are Dangerous

A simple "retry 3 times immediately" policy can make outages worse. If a backend
is struggling under load and 1,000 clients each retry 3 times immediately, you
just tripled the load on an already overloaded service. This is a retry storm,
and it's one of the most common causes of extended outages in microservices
architectures.

### Exponential Backoff with Jitter

The solution is exponential backoff with jitter. Instead of retrying
immediately, each attempt waits exponentially longer, and a random component
prevents synchronized retries:

```typescript
function calculateBackoff(
  attempt: number,
  baseDelayMs: number = 100,
  maxDelayMs: number = 10000,
): number {
  // Exponential backoff: 100ms, 200ms, 400ms, 800ms...
  const exponentialDelay = baseDelayMs * Math.pow(2, attempt);

  // Cap at maximum delay
  const cappedDelay = Math.min(exponentialDelay, maxDelayMs);

  // Add jitter: random value between 0 and the calculated delay
  const jitter = Math.random() * cappedDelay;

  return Math.floor(jitter);
}
```

### Idempotency Considerations

Before adding retries to a route, you must determine whether the operation is
safe to retry:

- **Safe to retry** — GET requests, idempotent PUT requests, operations with
  idempotency keys
- **Dangerous to retry** — non-idempotent POST requests (creating orders,
  processing payments) unless the backend supports idempotency keys
- **Conditionally safe** — requests where you can check whether the original
  succeeded before retrying

Your gateway should only enable automatic retries on routes where the backend
operation is idempotent. For non-idempotent operations, return the error to the
client and let them decide whether to retry with an idempotency key.

## Timeout Management

Timeouts are the most fundamental resilience mechanism. Without them, a hung
backend silently consumes gateway connections until the entire system grinds to
a halt.

### Types of Timeouts

A complete timeout strategy involves multiple layers:

- **Connection timeout** — how long the gateway waits to establish a TCP
  connection with the backend. Keep this short (5-10 seconds). If a backend
  isn't accepting connections, waiting longer won't help
- **Read timeout** — how long the gateway waits for the backend to start
  responding after the connection is established. This varies by operation — a
  simple lookup might need 5 seconds, while a report generation endpoint might
  need 60
- **Total request timeout** — the maximum end-to-end time for the entire
  request, including retries. This protects against scenarios where individual
  timeouts are acceptable but the cumulative time is too long

### Per-Route Timeout Configuration

Different endpoints have fundamentally different performance characteristics. A
health check should respond in milliseconds. A data export might legitimately
take 30 seconds. Your gateway should configure timeouts per route, not globally.

In Zuplo, the [platform limits](/docs/articles/limits) define default connection
timeouts between your gateway and origin servers — for example, a 19-second TCP
connection timeout and a 180-second proxy read timeout on the managed edge. You
can implement stricter per-route timeouts in a custom policy using
`AbortSignal`:

```typescript
import { ZuploContext, ZuploRequest, HttpProblems } from "@zuplo/runtime";

export default async function timeoutPolicy(
  request: ZuploRequest,
  context: ZuploContext,
  options: { timeoutMs: number },
  policyName: string,
) {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), options.timeoutMs);

  try {
    const response = await fetch(request.url, {
      method: request.method,
      headers: request.headers,
      body: request.body,
      signal: controller.signal,
    });
    clearTimeout(timeoutId);
    return response;
  } catch (error) {
    clearTimeout(timeoutId);
    if (error instanceof DOMException && error.name === "AbortError") {
      context.log.error(`Request timed out after ${options.timeoutMs}ms`);
      return HttpProblems.gatewayTimeout(request, context, {
        detail: `Backend did not respond within ${options.timeoutMs}ms`,
      });
    }
    throw error;
  }
}
```

## Bulkhead Pattern

The bulkhead pattern borrows from ship design — bulkheads divide a ship's hull
into watertight compartments so that a breach in one section doesn't flood the
entire vessel. Applied to APIs, the pattern isolates resources per service or
consumer to prevent cascade failures.

### Why Bulkheads Matter

Without isolation, all routes through your gateway share the same connection
pool. A single slow backend can exhaust every available connection, blocking
requests to perfectly healthy services. Consider this scenario:

1. Your payment service starts responding slowly (5-second responses instead of
   200ms)
2. Gateway connections to the payment service pile up, each waiting for a
   response
3. The shared connection pool fills up
4. Requests to your user service, product service, and search service all start
   failing — not because those services are unhealthy, but because there are no
   available connections

### Implementing Bulkheads

Bulkhead implementation at the gateway layer typically means limiting concurrent
requests per backend or per consumer:

- **Per-service bulkheads** — limit the maximum concurrent requests to each
  backend service. If the payment service gets 50 concurrent connection slots,
  it can only consume those 50, even if it's slow. The remaining connection
  capacity stays available for other services
- **Per-consumer bulkheads** — limit concurrent requests per API key or user.
  One consumer's burst of traffic can't monopolize the gateway's resources

In Zuplo, you can combine [rate limiting](/docs/policies/rate-limit-inbound) per
consumer with custom policies that track concurrent connections per backend to
implement effective bulkhead isolation.

## Rate Limiting as a Resilience Tool

Rate limiting is usually discussed as an access control mechanism, but it's
equally important as a resilience pattern. By capping the request rate per
consumer, IP address, or API key, rate limiting prevents any single source of
traffic from overwhelming your backends.

### How Rate Limiting Protects Your Stack

Rate limiting provides several resilience benefits:

- **Backend protection** — even if a client misbehaves, they can't send more
  requests than your backend can handle
- **Consumer isolation** — one consumer's aggressive usage doesn't degrade
  service for others
- **Predictable capacity** — with rate limits in place, you can size your
  backend infrastructure based on known maximum throughput
- **DDoS mitigation** — rate limits provide a first line of defense against
  volumetric attacks

Zuplo's built-in [rate limiting policy](/docs/policies/rate-limit-inbound)
supports limiting by IP address, authenticated user, API key, or custom
attributes. You can set different limits per route, per consumer tier, or per
plan — and when limits are exceeded, clients receive a standard
`429 Too Many Requests` response with headers indicating when they can retry.

You can also implement dynamic rate limiting with a
[custom function](/docs/policies/rate-limit-inbound#using-a-custom-function)
that adjusts limits based on runtime conditions — for example, reducing allowed
requests when your backend's response time exceeds a threshold.

## Graceful Degradation Strategies

When a backend fails, the worst response is no response. Graceful degradation
means returning something useful — even if it's not the complete or freshest
data — instead of an error.

### Serving Cached Responses

If a backend is unavailable, the gateway can serve a previously cached response.
This works well for data that doesn't change frequently — product catalogs,
configuration endpoints, or content feeds. The client gets slightly stale data
instead of a 503 error.

Zuplo provides multiple caching layers to support this pattern. The built-in
[caching policy](/docs/policies/caching-inbound) automatically caches responses
based on configurable TTLs, and the
[ZoneCache](/docs/programmable-api/zone-cache) API gives you programmatic
control for implementing custom cache-on-failure logic:

```typescript
import {
  ZuploContext,
  ZuploRequest,
  ZoneCache,
  HttpProblems,
} from "@zuplo/runtime";

export default async function handler(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const cache = new ZoneCache<string>("fallback-cache", context);
  const cacheKey = new URL(request.url).pathname;

  try {
    const response = await fetch("https://api.backend.example.com" + cacheKey);

    if (response.ok) {
      const body = await response.text();
      // Cache successful responses for fallback use
      cache.put(cacheKey, body, 300).catch((err) => context.log.error(err));
      return new Response(body, { headers: response.headers });
    }

    throw new Error(`Backend returned ${response.status}`);
  } catch (error) {
    // Try to serve from cache on failure
    const cached = await cache.get(cacheKey);
    if (cached) {
      context.log.warn("Serving cached fallback response");
      return new Response(cached, {
        headers: { "X-Fallback": "true", "Content-Type": "application/json" },
      });
    }

    return HttpProblems.serviceUnavailable(request, context, {
      detail: "Service temporarily unavailable",
    });
  }
}
```

### Fallback Endpoints

For critical paths, configure fallback backends that the gateway routes to when
the primary fails. This is different from load balancing — the fallback might be
a simplified service that handles the most common request patterns, or a static
endpoint returning default data.

### Reduced Functionality Mode

Not every feature of your API is equally critical. When backends are under
stress, the gateway can disable non-essential features (recommendations, related
items, analytics tracking) while keeping core functionality (authentication,
core data, transactions) fully operational.

## Health Checks and Automatic Failover

Resilience patterns like circuit breakers are reactive — they detect failures
after they affect requests. Health checks are proactive — they detect problems
before clients are impacted.

### Active vs. Passive Health Checks

- **Active health checks** — the gateway periodically sends synthetic requests
  to each backend (e.g., hitting a `/health` endpoint every 10 seconds). If a
  backend fails consecutive health checks, the gateway stops routing traffic to
  it
- **Passive health checks** — the gateway monitors the responses from real
  traffic. A spike in 5xx errors or timeouts triggers the backend to be marked
  as unhealthy

The best approach uses both. Active checks catch backends that are down but not
yet receiving traffic. Passive checks catch degradation that a simple health
endpoint might miss — like a service that returns 200 on its health endpoint but
times out on real queries.

### Edge-Native Failover

Zuplo's [edge-native architecture](/docs/managed-edge/overview) provides
infrastructure-level failover that self-hosted gateways can't match. Your API
gateway runs across
[300+ edge locations worldwide](https://zuplo.com/docs/api-management/introduction),
and if one location experiences issues, traffic automatically routes to the
nearest healthy location. This happens at the network layer — no configuration,
no custom code, no manual intervention.

For teams with stricter requirements, Zuplo's
[managed dedicated architecture](/docs/dedicated/architecture) supports
multi-region deployments with global load balancing. Traffic routes to the
closest healthy region, and if an entire region experiences an outage, a global
load balancer handles failover automatically.

## Combining Patterns: A Layered Resilience Strategy

These patterns work best when layered together. Here's how they compose for a
typical API:

1. **Rate limiting** is the outermost layer — it caps inbound traffic before any
   processing happens, protecting everything downstream
2. **Timeouts** ensure no single request consumes resources indefinitely
3. **Retries with backoff** handle transient failures transparently, so clients
   don't need to implement their own retry logic
4. **Circuit breakers** detect sustained failures and fast-fail requests,
   preventing retry storms and giving backends time to recover
5. **Bulkheads** isolate failures to specific services or consumers, preventing
   blast radius from spreading
6. **Graceful degradation** kicks in when all else fails, serving cached
   responses or reduced functionality instead of errors
7. **Health checks** continuously monitor backend health and remove unhealthy
   instances from the routing pool before clients are affected

The order matters. Retries should happen inside circuit breakers — if the
circuit is open, there's no point retrying. Rate limiting should be enforced
before retries — you don't want retries consuming rate limit quota.

## How Major API Gateways Handle Resilience

Different gateways take different approaches to resilience. Here's how the major
options compare.

### Kong

Kong provides circuit breakers and health checks through its built-in upstream
health checking system. Active and passive health checks are configurable
per-upstream, and circuit breaking is tightly coupled to the health check
mechanism. Retry and timeout configuration is available declaratively.
Implementing custom resilience logic requires writing plugins in Lua (Kong's
primary plugin language), Go, or via gRPC — Lua being a less common language for
most API development teams.

### AWS API Gateway

AWS API Gateway offers integration timeouts (29 seconds by default for REST
APIs, with the option to request an increase; 30 seconds for HTTP APIs) and
automatic retry logic for some integration types. There is no built-in circuit
breaker — teams typically implement this pattern using a combination of AWS
Lambda, Step Functions, and CloudWatch alarms. Complex resilience patterns
require orchestrating multiple AWS services together.

### Azure API Management

Azure APIM provides declarative retry policies, circuit breaker behavior through
its backend entity configuration, and timeout settings. These are configured via
XML policy expressions. Custom logic is possible but constrained to Azure's
policy expression syntax, which is less flexible than a general-purpose
programming language.

### Tyk

Tyk offers built-in circuit breakers with configurable thresholds and
return-to-service checks. Built-in rate limiting, enforced timeouts, and retry
logic are available through declarative configuration. Custom plugins are
written in Go, JavaScript, Python, Lua, or any gRPC-supported language —
offering more language choices than Kong but still requiring gateway-specific
plugin development.

## Implementing Resilience with Zuplo

Zuplo's approach to resilience differs from traditional gateways in a
fundamental way: instead of offering a fixed menu of declarative resilience
options, Zuplo gives you a
[programmable handler pipeline](/docs/articles/policies) where you write
resilience logic in TypeScript.

This matters because resilience requirements are rarely one-size-fits-all. A
payment endpoint needs different retry behavior than a search endpoint. A
consumer on a free tier should hit circuit breakers at different thresholds than
an enterprise customer. A programmable gateway lets you express these nuances in
code rather than fighting with declarative configuration limits.

Key Zuplo capabilities for building resilient APIs:

- **Custom inbound and outbound policies** — implement circuit breakers, retry
  logic, timeout enforcement, and fallback behavior in TypeScript using
  [custom policies](/docs/policies/custom-code-inbound)
- **Edge-native architecture** — 300+ global edge locations provide inherent
  redundancy and [automatic failover](/docs/managed-edge/overview) at the
  network layer
- **Built-in rate limiting** — the
  [rate limiting policy](/docs/policies/rate-limit-inbound) supports
  per-consumer, per-route, and custom bucket-based limiting out of the box
- **ZoneCache for fallback data** —
  [ZoneCache](/docs/programmable-api/zone-cache) provides low-latency caching
  for implementing fallback responses and circuit breaker state
- **RFC 7807 error responses** — the
  [HttpProblems helper](/docs/programmable-api/http-problems) generates
  standardized error responses, so clients can programmatically handle
  degradation scenarios
- **Runtime hooks** —
  [runtime extensions](/docs/programmable-api/runtime-extensions) let you add
  global error handlers and request lifecycle hooks

The trade-off between Zuplo and declarative gateways is clear: declarative
gateways get you basic resilience patterns faster, while a programmable gateway
gives you the flexibility to implement sophisticated, context-aware resilience
logic tailored to your specific requirements.

## Getting Started

Building a resilient API gateway isn't an all-or-nothing effort. Start with the
patterns that address your most common failure modes:

1. **Add rate limiting first** — it's the simplest pattern with the biggest
   impact. Zuplo's [rate limiting policy](/docs/policies/rate-limit-inbound) can
   be added to any route in minutes
2. **Set appropriate timeouts** — review your route configurations and ensure
   every backend connection has a reasonable timeout
3. **Implement circuit breakers for critical backends** — start with your most
   failure-prone or most critical downstream services
4. **Add cached fallbacks for read-heavy endpoints** — use
   [ZoneCache](/docs/programmable-api/zone-cache) to serve stale data when
   backends fail
5. **Monitor and iterate** — use your gateway's observability data to identify
   which routes need stronger resilience policies

Ready to add resilience to your API?
[Sign up for Zuplo](https://portal.zuplo.com/signup) and start building with
programmable policies in minutes, or explore the [Zuplo documentation](/docs) to
learn more about custom policies, rate limiting, and edge-native deployment.