Rate Limiting

How rate limiting works

This page covers the mechanics behind Zuplo's rate limiter: how requests are counted, what each rateLimitBy mode does in detail, and every configuration option available. If you just want to add a rate limit to your API, start with the Getting Started guide instead — this page is the deep dive you can read alongside or after it.

Zuplo's rate limiter uses a sliding window algorithm enforced globally across all edge locations. Unlike a fixed window algorithm (which resets counters at fixed intervals and can allow bursts at window boundaries), the sliding window continuously tracks requests over a rolling time period. This produces smoother, more predictable throttling behavior.

Key terms

A few terms show up repeatedly in the rate limiting docs. They are related but not interchangeable.

Counter (or bucket) — The running tally Zuplo keeps for a single caller and a single policy. Each unique combination of policy name and caller identifier gets its own counter. Two different policies tracking the same caller do not share a counter; two different callers under the same policy do not share a counter either.
Rate limit key — The string value that identifies a caller for bucketing. For rateLimitBy: "ip" the key is the client's IP address; for "user" it is request.user.sub; for "function" it is whatever your custom function returns as CustomRateLimitDetails.key; for "all" there is a single implicit key shared by every request to the route.
identifier option — A field in the policy's configuration that points Zuplo at your custom TypeScript function when rateLimitBy is "function". Zuplo calls that function on each request, and the function returns a CustomRateLimitDetails object whose key property becomes the rate limit key. In short: identifier is where the function lives; key is what the function returns.

How `rateLimitBy` works

The rateLimitBy option determines how the rate limiter groups requests into buckets. Both the standard Rate Limiting policy and the Complex Rate Limiting policy support the same four modes.

`ip`

Groups requests by the client's IP address. No authentication is required. This is the simplest option and works well for public APIs or as a first layer of protection.

Multiple clients behind the same corporate proxy, cloud NAT, or shared Wi-Fi network can share a single IP address. In these cases, IP-based rate limiting can unfairly throttle unrelated users. For authenticated APIs, prefer rateLimitBy: "user" instead.

`user`

Groups requests by the authenticated user's identity (request.user.sub). When using API key authentication, the sub value is the consumer name you assigned when creating the API key. When using JWT authentication, it comes from the token's sub claim.

This is the recommended mode for authenticated APIs because it ties limits to the actual consumer rather than a shared IP address.

The user mode requires an authentication policy (such as API key or JWT authentication) earlier in the policy pipeline. If no authenticated user is present on the request, the policy returns an error. See Getting Started §5 for a full authenticated pipeline example.

`function`

Groups requests using a custom TypeScript function that you provide. The function returns a CustomRateLimitDetails object containing a grouping key and, optionally, overridden values for requestsAllowed and timeWindowMinutes. See Custom rate limit functions below for the function signature and field reference.

`all`

Applies a single shared counter across all requests to the route, regardless of who makes them. Use this for global rate limits on endpoints that call resource-constrained backends.

Sub-minute time windows

Time windows don't have to be whole minutes. The timeWindowMinutes option accepts fractional values, so you can define windows measured in seconds:


Code
{
  "name": "burst-protection",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 10,
      "timeWindowMinutes": 0.5
    }
  }
}

This policy allows each user 10 requests in any 30-second sliding window. Some useful values:

`timeWindowMinutes`	Window length
`0.5`	30 seconds
`0.25`	15 seconds
`0.1`	6 seconds

Fractional values work anywhere timeWindowMinutes appears — including the Complex Rate Limiting policy and the values returned by a custom rate limit function.

Short windows pair well with a longer sustained limit on the same route: the tight window absorbs bursts while the longer one enforces overall usage. See Combining Policies for how to stack rate limits.

Short windows and accuracy

The rate limiter is globally distributed: counters synchronize across edge locations worldwide, so a limit applies consistently regardless of which region serves a request. That synchronization can't outrun the speed of light (Zuplo is working on bending physics). As a window shrinks toward a few seconds, synchronization takes up a bigger fraction of the window, so counts from different regions can fall out of step before the window elapses — and a small number of requests that should exceed the limit can slip through.

Windows of roughly 5 seconds or longer give the most consistent enforcement. Shorter windows still work; they just carry a slightly higher chance of a missed limit application.

Custom rate limit functions

When rateLimitBy is set to "function", Zuplo calls a TypeScript function you provide on every request. The function receives the request, context, and policy name, and returns a CustomRateLimitDetails object describing how to count that request.


Code
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export function rateLimit(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails | undefined {
  return {
    key: request.user.sub,
    requestsAllowed: 100,
    timeWindowMinutes: 1,
  };
}

`CustomRateLimitDetails`

key (required) — The string used to group requests into rate limit buckets.
requestsAllowed (optional) — Overrides the policy's requestsAllowed value for this request.
timeWindowMinutes (optional) — Overrides the policy's timeWindowMinutes value for this request.

Returning undefined skips rate limiting for the request entirely — useful for health checks or privileged callers. The function can also be async if you need to await a database lookup or external service call.

Wire the function into the policy using the identifier option. The policy's configured requestsAllowed and timeWindowMinutes serve as defaults; the function can override them per request.

For concrete walkthroughs (tier-based, route-based, method-based, database-backed, selective bypass), see Dynamic Rate Limiting. For an advanced database-backed example with caching, see Per-user rate limiting with a database.

Additional options

Both rate limiting policies support the following additional options:

headerMode — Set to "retry-after" (default) to include the Retry-After header in 429 responses, or "none" to omit it. The Retry-After value is returned as a number of seconds (delay-seconds format).
mode — Set to "strict" (default) or "async". In strict mode, the request is held until the rate limit check completes — the backend is never called if the limit is exceeded. This adds some latency to every request because the check hits a globally distributed rate limit service. In async mode, the request proceeds to the backend in parallel with the rate limit check. This minimizes added latency but means some requests may get through even after the limit is exceeded. Async mode is a good fit when low latency matters more than exact enforcement.
throwOnFailure — Controls behavior when the rate limit service is unreachable. When set to false (default), requests are allowed through (fail-open). When set to true, the policy returns an error to the client. The fail-open default prevents a rate limit service outage from blocking all traffic to your API.

Complex Rate Limiting policy

The Complex Rate Limiting policy supports multiple named counters in a single policy. Each counter tracks a different resource or unit of work.


Code
{
  "name": "my-complex-rate-limit-policy",
  "policyType": "complex-rate-limit-inbound",
  "handler": {
    "export": "ComplexRateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "timeWindowMinutes": 1,
      "limits": {
        "requests": 100,
        "compute": 500
      }
    }
  }
}

Override counter increments programmatically per request with ComplexRateLimitInboundPolicy.setIncrements(). This suits usage-based pricing, where different endpoints consume different amounts of a resource (for example, counting compute units or tokens instead of raw requests).

Go deeper on configuration:

Rate Limiting policy reference — Every option for the standard policy.
Complex Rate Limiting policy reference — Multi-counter limits for usage-based pricing (enterprise).

Learn by example:

Dynamic Rate Limiting — Tiered limits by customer type.
Per-user rate limiting with a database — Look up limits at request time using ZoneCache and a database.

Combine with other policies:

Combining Policies — Stack multiple rate limits, and pair rate limiting with quotas or monetization.
Quota policy — Monthly or billing-period usage caps.
Monetization policy — Subscription-based access control and metering.

Edit this page

Last modified on July 22, 2026

Getting started Rate Limiting

Rate Limiting

How rate limiting works

Key terms

A few terms show up repeatedly in the rate limiting docs. They are related but not interchangeable.

Counter (or bucket) — The running tally Zuplo keeps for a single caller and a single policy. Each unique combination of policy name and caller identifier gets its own counter. Two different policies tracking the same caller do not share a counter; two different callers under the same policy do not share a counter either.
Rate limit key — The string value that identifies a caller for bucketing. For rateLimitBy: "ip" the key is the client's IP address; for "user" it is request.user.sub; for "function" it is whatever your custom function returns as CustomRateLimitDetails.key; for "all" there is a single implicit key shared by every request to the route.
identifier option — A field in the policy's configuration that points Zuplo at your custom TypeScript function when rateLimitBy is "function". Zuplo calls that function on each request, and the function returns a CustomRateLimitDetails object whose key property becomes the rate limit key. In short: identifier is where the function lives; key is what the function returns.

How `rateLimitBy` works

The rateLimitBy option determines how the rate limiter groups requests into buckets. Both the standard Rate Limiting policy and the Complex Rate Limiting policy support the same four modes.

`ip`

Groups requests by the client's IP address. No authentication is required. This is the simplest option and works well for public APIs or as a first layer of protection.

`user`

This is the recommended mode for authenticated APIs because it ties limits to the actual consumer rather than a shared IP address.

`function`

`all`

Applies a single shared counter across all requests to the route, regardless of who makes them. Use this for global rate limits on endpoints that call resource-constrained backends.

Sub-minute time windows

Time windows don't have to be whole minutes. The timeWindowMinutes option accepts fractional values, so you can define windows measured in seconds:


Code
{
  "name": "burst-protection",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 10,
      "timeWindowMinutes": 0.5
    }
  }
}

This policy allows each user 10 requests in any 30-second sliding window. Some useful values:

`timeWindowMinutes`	Window length
`0.5`	30 seconds
`0.25`	15 seconds
`0.1`	6 seconds

Fractional values work anywhere timeWindowMinutes appears — including the Complex Rate Limiting policy and the values returned by a custom rate limit function.

Short windows and accuracy

Windows of roughly 5 seconds or longer give the most consistent enforcement. Shorter windows still work; they just carry a slightly higher chance of a missed limit application.

Custom rate limit functions


Code
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export function rateLimit(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails | undefined {
  return {
    key: request.user.sub,
    requestsAllowed: 100,
    timeWindowMinutes: 1,
  };
}

`CustomRateLimitDetails`

key (required) — The string used to group requests into rate limit buckets.
requestsAllowed (optional) — Overrides the policy's requestsAllowed value for this request.
timeWindowMinutes (optional) — Overrides the policy's timeWindowMinutes value for this request.

Wire the function into the policy using the identifier option. The policy's configured requestsAllowed and timeWindowMinutes serve as defaults; the function can override them per request.

Additional options

Both rate limiting policies support the following additional options:

headerMode — Set to "retry-after" (default) to include the Retry-After header in 429 responses, or "none" to omit it. The Retry-After value is returned as a number of seconds (delay-seconds format).
mode — Set to "strict" (default) or "async". In strict mode, the request is held until the rate limit check completes — the backend is never called if the limit is exceeded. This adds some latency to every request because the check hits a globally distributed rate limit service. In async mode, the request proceeds to the backend in parallel with the rate limit check. This minimizes added latency but means some requests may get through even after the limit is exceeded. Async mode is a good fit when low latency matters more than exact enforcement.
throwOnFailure — Controls behavior when the rate limit service is unreachable. When set to false (default), requests are allowed through (fail-open). When set to true, the policy returns an error to the client. The fail-open default prevents a rate limit service outage from blocking all traffic to your API.

Complex Rate Limiting policy

The Complex Rate Limiting policy supports multiple named counters in a single policy. Each counter tracks a different resource or unit of work.


Code
{
  "name": "my-complex-rate-limit-policy",
  "policyType": "complex-rate-limit-inbound",
  "handler": {
    "export": "ComplexRateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "timeWindowMinutes": 1,
      "limits": {
        "requests": 100,
        "compute": 500
      }
    }
  }
}

Go deeper on configuration:

Rate Limiting policy reference — Every option for the standard policy.
Complex Rate Limiting policy reference — Multi-counter limits for usage-based pricing (enterprise).

Learn by example:

Dynamic Rate Limiting — Tiered limits by customer type.
Per-user rate limiting with a database — Look up limits at request time using ZoneCache and a database.

Combine with other policies:

Combining Policies — Stack multiple rate limits, and pair rate limiting with quotas or monetization.
Quota policy — Monthly or billing-period usage caps.
Monetization policy — Subscription-based access control and metering.

Edit this page

Last modified on July 22, 2026

Getting started Rate Limiting

Key terms

How rateLimitBy works

ip

user

function

all

Sub-minute time windows

Short windows and accuracy

Custom rate limit functions

CustomRateLimitDetails

Additional options

Complex Rate Limiting policy

Related resources

Key terms

How rateLimitBy works

ip

user

function

all

Sub-minute time windows

Short windows and accuracy

Custom rate limit functions

CustomRateLimitDetails

Additional options

Complex Rate Limiting policy

Related resources

How `rateLimitBy` works

`ip`

`user`

`function`

`all`

`CustomRateLimitDetails`

How `rateLimitBy` works

`ip`

`user`

`function`

`all`

`CustomRateLimitDetails`