Rate Limiting

Getting started with rate limiting

Rate limiting caps how many requests a client can make to your API within a time window. It protects your backend from traffic spikes, enforces fair usage across consumers, and supports tiered access for different customer plans. When a client exceeds the configured limit, they receive a 429 Too Many Requests response with a Retry-After header indicating when they can retry.

This guide walks you through picking a rateLimitBy strategy, adding the policy to a route, and testing it end to end. If you want the sliding window algorithm, every rateLimitBy mode in detail, and the full set of configuration levers, read How Rate Limiting Works alongside or after this guide.

Choose an approach

Pick a rateLimitBy mode based on what your API looks like today. If you are not sure, start from the first row that matches and follow the linked guide or section below.

Use case	`rateLimitBy`	Policy	Learn more
Public API with no authentication	`ip`	Rate Limiting	Follow the steps below
Authenticated API, same limit for every consumer	`user`	Rate Limiting	§5 Rate limit authenticated users
Tiered limits (free, pro, enterprise) from API key metadata	`function`	Rate Limiting with a custom function	Dynamic Rate Limiting
Tiered limits sourced from a database	`function`	Rate Limiting with a custom function	Per-user limits with a database
Single global cap on an expensive endpoint	`all`	Rate Limiting	How rate limiting works
Usage-based pricing counting multiple resources per request	`user`	Complex Rate Limiting (enterprise)	How rate limiting works

rateLimitBy: "user" requires an authentication policy (such as API key or JWT authentication) earlier in the route's policy pipeline. Without it, the rate limit policy has no user to group requests by and returns an error. Section 5 below walks through the full authenticated setup.

For a definition of rateLimitBy, the sliding window algorithm, and the full list of configuration options (mode, headerMode, throwOnFailure, and more), see How Rate Limiting Works.

Prerequisites

An existing Zuplo project with at least one route configured in config/routes.oas.json.
The Zuplo CLI installed, or access to the Zuplo Portal.
To test rate limiting locally, the project must be linked to a Zuplo environment. Run npx zuplo link once in the project directory and select an environment. Rate limiting uses a globally distributed counter service, so an unlinked local project cannot enforce limits. See Connecting to Zuplo Services Locally for more detail.

1. Add the policy

Open config/policies.json and add a rate limiting policy to the policies array. This example limits each IP address to 2 requests per minute, which makes it easy to test.


Code
{
  "policies": [
    {
      "name": "rate-limit-inbound",
      "policyType": "rate-limit-inbound",
      "handler": {
        "export": "RateLimitInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "rateLimitBy": "ip",
          "requestsAllowed": 2,
          "timeWindowMinutes": 1
        }
      }
    }
  ]
}

The key options are:

rateLimitBy -- How to group requests into rate limit buckets. "ip" groups by the caller's IP address and requires no authentication.
requestsAllowed -- The maximum number of requests allowed in the time window.
timeWindowMinutes -- The length of the sliding time window in minutes.

If your project already has other policies in config/policies.json, add the rate limiting entry to the existing policies array rather than replacing it.

The name field (rate-limit-inbound above) is what scopes the counter. Every route that references this exact name shares the same counter. If you later copy this policy block to create a second limit, change the name — a forgotten rename silently merges two unrelated limits into one. Policy names must also match exactly between config/policies.json and config/routes.oas.json; a typo there causes the policy to be skipped without any error. See Counter scoping for the full rules.

2. Attach the policy to a route

Open config/routes.oas.json and add the policy name to the policies.inbound array inside the x-zuplo-route object of the route you want to protect.


Code
{
  "paths": {
    "/my-route": {
      "get": {
        "operationId": "get-my-route",
        "x-zuplo-route": {
          "corsPolicy": "anything-goes",
          "handler": {
            "export": "urlForwardHandler",
            "module": "$import(@zuplo/runtime)",
            "options": {
              "baseUrl": "https://api.example.com"
            }
          },
          "policies": {
            "inbound": ["rate-limit-inbound"]
          }
        }
      }
    }
  }
}

The "rate-limit-inbound" string must match the name field from the policy you defined in config/policies.json. When a request hits this route, Zuplo runs each inbound policy in array order before forwarding to the handler.

You can attach the same policy to multiple routes. Add its name to the policies.inbound array on each route that needs rate limiting.

3. Test the rate limit

Start your local dev server (or deploy to a Zuplo environment) and send requests to the protected route. With the configuration above, the third request within a one-minute window returns a 429 response.

Code
# Send three requests in quick succession
for i in 1 2 3; do
  echo "--- Request $i ---"
  curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:9000/my-route
done

The first two requests return a 200 response from your upstream service. The third request returns a 429 Too Many Requests response in Problem Details format:


Code
{
  "type": "https://httpproblems.com/http-status/429",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "Rate limit exceeded",
  "instance": "/my-route",
  "trace": {
    "requestId": "4d54e4ee-c003-4d75-aba9-e09a6d707b08",
    "timestamp": "2026-04-14T12:00:00.000Z",
    "buildId": "ec44e831-3a02-467e-a26c-7e401e4473bf"
  }
}

The response also includes a Retry-After header with the number of seconds until the client can send another request (for example, Retry-After: 42).

4. Choose production limits

The requestsAllowed: 2 value above exists so the limit triggers on your third curl. Production APIs need numbers that reflect real usage. There is no single right answer, but these reference points from widely used APIs are a useful starting point:

API	Typical per-consumer limit
Stripe	100 read and 100 write requests per second per account
GitHub	5,000 authenticated requests per hour per user
Twilio	100 requests per second per account (varies by resource)
Shopify	40 requests per app per store (bucket refills at 2/second)

When sizing your own limit, consider three inputs:

What your backend can sustain. Start from a conservative fraction of your backend's measured capacity so that a single caller cannot exhaust it.
What legitimate callers actually do. If p99 usage for your best customers is 10 requests per minute, a 100-per-minute limit leaves headroom without being permissive.
How your customers are structured. Per-API-key limits usually give tighter control than per-IP; a single corporate IP can hide dozens of real users.

It is almost always easier to raise a limit in response to a support ticket than to lower one that customers have started relying on. When in doubt, start low, measure, and increase.

5. Rate limit authenticated users

IP-based limits are a good first layer but they penalize every user behind a shared NAT or corporate proxy. For an authenticated API, limit per consumer instead. This requires an authentication policy earlier in the pipeline so that request.user is populated before the rate limit policy runs.

The full policies configuration looks like this:


Code
{
  "policies": [
    {
      "name": "api-key-auth",
      "policyType": "api-key-inbound",
      "handler": {
        "export": "ApiKeyInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "allowUnauthenticatedRequests": false
        }
      }
    },
    {
      "name": "rate-limit-per-user",
      "policyType": "rate-limit-inbound",
      "handler": {
        "export": "RateLimitInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "rateLimitBy": "user",
          "requestsAllowed": 60,
          "timeWindowMinutes": 1
        }
      }
    }
  ]
}

Attach both policies to the route, with authentication first so the rate limit policy has a user to group by:


Code
{
  "x-zuplo-route": {
    "policies": {
      "inbound": ["api-key-auth", "rate-limit-per-user"]
    }
  }
}

Create two API keys in the Zuplo Portal (or with the CLI) so you can verify that each consumer has its own counter. Then send requests with each key:

Code
# Replace with the tokens from your two API keys.
KEY_A="zpka_xxxxxxxxxxxxxxxxxxxxxx"
KEY_B="zpka_yyyyyyyyyyyyyyyyyyyyyy"

# Burn through the limit on key A; key B should still succeed.
for i in $(seq 1 61); do
  curl -s -o /dev/null -w "A #$i: %{http_code}\n" \
    -H "Authorization: Bearer $KEY_A" \
    http://localhost:9000/my-route
done

curl -s -w "\nB #1: %{http_code}\n" \
  -H "Authorization: Bearer $KEY_B" \
  http://localhost:9000/my-route

Requests 1–60 for key A return 200, request 61 returns 429, and the first request for key B still returns 200. That confirms the counter is scoped to each consumer, not shared across the API key pool.

See API Key Authentication for the full walkthrough of creating and managing API keys. If you use JWT authentication instead, replace the api-key-auth policy with your JWT policy — the rate limit policy works the same way as long as request.user.sub is populated.

Next steps

Understand the mechanics:

How Rate Limiting Works — The sliding window algorithm, every rateLimitBy mode in detail, and advanced options like mode, headerMode, and throwOnFailure.

Customize the behavior:

Dynamic Rate Limiting — Vary limits per caller using a custom TypeScript function (for example, higher limits for paid plans).
Per-user limits with a database — An advanced example using ZoneCache and a database lookup to drive limits per customer.

Combine with other policies:

Combining Policies — Stack per-minute and per-hour limits, pair rate limiting with quotas, and layer in monetization.

Operate in production:

Monitoring and Troubleshooting — Observe limits in production, alert on silent failures, and diagnose unexpected 429s.

Reference:

Rate Limiting policy reference — Every configuration option for the standard policy.
Complex Rate Limiting policy reference — Multi-counter configuration for usage-based pricing (enterprise).

Edit this page

Last modified on June 11, 2026

Service Limits How it works