How to Rate Limit an API

Every API hits this at some point. One customer’s job runs hot, starts firing thousands of requests a minute, and the rest of your users feel it as slow responses or errors. Nobody did anything malicious. The API just couldn’t tell “one customer ran a loop” apart from “all customers need help”.

That’s the gap rate limiting closes. It tells the gateway how many requests a single caller can make in a window of time, and what to do when they go over. Most APIs need it. Plenty of teams put it off because the literature makes it sound complicated, when the first version is short.

This post is the execution companion to The subtle art of API Rate-Limiting, which covers the design decisions. Here we go from an OpenAPI spec to a working rate limit on a Zuplo gateway, using a todo list API as the running example. Swap the spec for your own and the steps line up exactly.

Use this approach if you're:

You have an existing API and an OpenAPI spec for it
You don't yet have rate limiting in front of it, or what you have is per-server in-memory
You want a working setup in under an hour, not a project

Why APIs need rate limiting

Three things go wrong without one, in roughly this order of frequency:

A noisy customer takes everyone down. A retry loop or an over-eager batch job is enough to saturate a shared backend, and the rest of your users feel it as latency or 5xxs.
Your API gets scraped or probed. Public endpoints get crawled, brute forced, and tested for vulnerabilities. Without a per-caller cap, an attacker doesn’t have to be clever, just patient.
Your bill grows in directions you didn’t plan for. Compute, egress, and any downstream LLM or third-party API charge per request. Rate limits put a ceiling on how much any one caller can spend on your behalf.

The fix in all three cases is the same: cap how many requests a caller can make in a given window, and reject the rest with a 429 Too Many Requests.

How Zuplo’s rate-limit-inbound policy works

Zuplo handles this with the rate-limit-inbound policy. You attach it to a route and configure three things:

rateLimitBy: who shares a counter. user (per API key or JWT subject), ip (per source IP), all (one global counter), or function (a custom TypeScript function decides per request).
requestsAllowed: how many requests fit in the window. Default 1000.
timeWindowMinutes: how long the window is. Default 60.

The policy uses a sliding window. Zuplo runs in 300+ edge locations and synchronises counts between them, so a caller who exhausts their limit in London can’t pick up a fresh window by routing through Tokyo. When a caller crosses the line they get a 429 with a Retry-After header.

No code, no Redis, no Lua script. The subtle art post covers the strict-vs-async trade-off if you want to dig into the synchronisation.

Import an OpenAPI spec into Zuplo

The fastest way to get a Zuplo gateway in front of an existing API is to import its OpenAPI document. Zuplo turns each operation in the spec into a route on the gateway, ready for policies.

For this walkthrough, assume a small todos API:

json

{
  "openapi": "3.1.0",
  "info": {
    "title": "Todos API",
    "version": "1.0.0"
  },
  "servers": [{ "url": "https://todo.zuplo.io" }],
  "paths": {
    "/todos": {
      "get": {
        "operationId": "listTodos",
        "summary": "Get all todos",
        "responses": {
          "200": { "description": "OK" }
        }
      }
    },
    "/todos/{id}": {
      "get": {
        "operationId": "getTodo",
        "summary": "Get a todo",
        "parameters": [
          {
            "name": "id",
            "in": "path",
            "required": true,
            "schema": { "type": "string" }
          }
        ],
        "responses": {
          "200": { "description": "OK" }
        }
      }
    }
  }
}

Sign in at portal.zuplo.com and create a new empty project. Open config/routes.oas.json and use the Import OpenAPI option to upload the spec.

The Zuplo portal Code view with the project file tree on the left (config/policies.json, config/routes.oas.json, modules, schemas, docs, public, README.md, package.json, tsconfig.json, zudoku.config.tsx, tests) and the Import OpenAPI dialog open in the centre with a Choose File button and the prompt “Drag your OpenAPI file here to upload. JSON, YML and YAML are supported.”

Zuplo merges the operations into config/routes.oas.json and keeps any Zuplo-specific settings on existing routes intact. Each operation becomes a route with a default URL Forward handler (Zuplo’s term for a passthrough proxy) pointing at the spec’s servers URL, so the gateway is already proxying requests to your backend.

Two notes before we add policies. The portal is the source of truth for the JSON config files: edit them in the portal’s code view, hand-edit them, or wire the project to a Git repo so changes flow through pull requests. Either way the portal redeploys on save.

Second, the rate limit policy can sit in policies.json once and be referenced by name from every route that needs it. No need to define it per route.

OpenAPI Support in Zuplo

How import works, including merge strategies, multi-file specs, and what Zuplo preserves on re-import.

Add the rate-limit-inbound policy

In the route designer (Code > routes.oas.json), pick the route you want to protect and click Add Policy on the Request side of the pipeline. Search for “rate” in the picker and you’ll see two variants: Rate Limiting is the one you want. Complex Rate Limiting is for multi-counter setups, skip it for now.

When you apply it, the portal opens a configuration dialog with sensible defaults already filled in:

The Configure a policy dialog in the Zuplo portal. The Name field shows “rate-limit-inbound” and the Configuration panel below contains the policy JSON: export “RateLimitInboundPolicy”, module “$import(@zuplo/runtime)”, options with rateLimitBy “ip”, requestsAllowed 2, and timeWindowMinutes 1.

That dialog is what gets written to config/policies.json:

json

{
  "name": "rate-limit-inbound-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "ip",
      "requestsAllowed": 2,
      "timeWindowMinutes": 1
    }
  }
}

Two requests per IP per minute is deliberately tight so the testing step trips quickly. Loosen it before you point real traffic at the gateway.

The route in config/routes.oas.json references the policy by name in its inbound chain:

json

"x-zuplo-route": {
  "handler": { "export": "urlForwardHandler", "module": "$import(@zuplo/runtime)" },
  "policies": {
    "inbound": ["rate-limit-inbound-policy"]
  }
}

The portal wires this for you, but it’s worth seeing once so the moving parts are obvious. Reuse the same policy name on any other route that needs the same limit.

The Zuplo route designer’s Policies panel after attaching the rate-limit-inbound policy. The Request column shows a single policy block labelled rate-limit-inbound with edit and remove icons, and an Add Policy button beneath it. The Response column shows only an Add Policy button. Below the chain, the Request Handler is set to URL Forward, forwarding to https://todo.zuplo.io.

Rate Limiting Policy Reference

Full reference for rate-limit-inbound: every option, the function mode, headers, and the strict vs async modes.

Common mistake:

Forgetting to bump requestsAllowed before going to production. The default of 2 trips quickly during testing, but most APIs want hundreds or thousands.

Test the rate limit

Save and let the gateway redeploy.

The default mode is strict: the gateway waits for a confirmed count before letting each request through, so three rapid curls trip the limit deterministically rather than racing the synchronisation.

Hammer the endpoint from your terminal:

bash

curl -i https://your-project.zuplo.app/todos
curl -i https://your-project.zuplo.app/todos
curl -i https://your-project.zuplo.app/todos

The first two return 200 OK from your backend. The third returns:

plaintext

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/problem+json

{
  "type": "https://httpproblems.com/http-status/429",
  "title": "Too Many Requests",
  "detail": "You have exceeded the rate limit",
  ...
}

The body uses Problem Details, the right shape for machine-readable API errors. The Retry-After header tells well-behaved clients when to try again.

If you’d rather not leave the portal, click Test Route at the top of any route’s configuration panel. The portal opens a request builder, fires the request against your live gateway, and renders the response inline. After three quick clicks of Send Request, you’ll see the same 429 you’d get from curl:

Pick the right rateLimitBy mode

The default ip is the easiest one to test with, but it’s almost never the right choice for production. Two consumers behind the same NAT or cloud egress range share an IP, so one customer’s spike rate-limits the other.

In practice, the most common reason teams switch from ip to user after launch isn’t abuse: it’s a single B2B customer behind a corporate proxy whose entire team gets rate-limited as one caller.

Better defaults, in order of how often they apply:

user: the right answer for any authenticated API. Zuplo’s auth policies (API key, JWT, OAuth) all populate request.user.sub with a stable caller identifier, and the rate limit policy reads that field to give each caller their own counter. Two API keys on the same customer account share a bucket. user mode needs an authentication policy ahead of it on the route, otherwise there’s no sub to group by.
function: a TypeScript function returns a grouping key and optional per-request limit overrides, so enterprise customers get higher limits without a redeploy. Covered in Per-User Rate Limiting on Supabase and How to Rate Limit AI Agents Beyond Request Counts.
all: one global counter across every caller. Useful for protecting a downstream with a hard total ceiling, like a paid third-party API. Less useful as a customer-facing limit.
ip: keep it for genuinely unauthenticated endpoints (signup, password reset, public search). Avoid for anything with a key.

Switching modes is a one-line change. Most production gateways end up with two policies on the same route: an ip one with a generous ceiling for blunt abuse protection, and a user one with the real per-customer limit underneath.

Where to go from here

You have a gateway in front of your API with a working rate limit, which covers the basic noisy-neighbor and abuse cases. Natural next steps, in order of how much they shift the design:

Authenticate first. A user-grouped rate limit only works if Zuplo knows who the user is. The API Key Authentication policy sits ahead of this one.
Move to dynamic limits. When the limit needs to vary per customer (free vs pro vs enterprise), switch rateLimitBy to function and read metadata off the API key.
Pair with monetization. Rate limits cap how fast a caller can hit you. To cap how much they consume in a billing period and charge for overage, the Monetization policy layers on top.

If you want the design thinking behind why rate limits look the way they do, The subtle art of API Rate-Limiting covers the trade-offs in depth.