Spike protection, per-tier limits, monthly quotas — at the edge
Per-IP, per-API-key, per-tenant, per-token. Programmable in TypeScript, configured in JSON, enforced before traffic touches your origin — with no Redis cluster to operate.
Rate limits are the cheapest insurance you can buy
One scraper, one runaway script, one curious AI agent away from a P0 — and the gateway is the only place that can stop it before your origin pays the price. The hard part isn't deciding to throttle; it's deciding fairly, per consumer, per resource, with state that actually scales.
One bad client takes the API down
A misconfigured cron loop, a runaway notebook, an enthusiastic AI agent — the gateway has no idea, your origin slows, and every other customer feels it.
Free tier eating your margin
Limits exist in the docs, not in the gateway. Your highest-volume users are also your unpaid users. Every conversion conversation starts with "please throttle me."
Token costs no one can predict
Your AI assistant charges by tokens; your rate limit counts requests. One "summarize this PDF" call costs as much as a thousand smaller ones, and you have no way to bill or block.
Redis to operate just to throttle
Distributed counters need shared state. So you stand up Redis, a sentinel cluster, replication, and on-call rotations — for what should be a feature flag on the gateway.
Throttling that fits how your business actually charges
Spike protection in one config block
Drop rate-limit-inbound on a route, set requests-per-minute, ship. The gateway absorbs spikes, your origin stays calm, and 429s come with a retry-after header your clients already know how to handle.
Throttle by anything you have data on
Per-customer, per-plan, per-tenant, per-region, per-time-of-day — or any combination. The limit is yours to define against any attribute on the request, the API key, or your business data. Rate limiting that fits your contracts, not the other way around.
Charge usage by what actually costs you
Most rate limits count one request as one unit — but a 5-token chat call doesn't cost the same as a 5,000-token one. Throttle on tokens, compute, downstream calls, or anything else, with each request charging the budget proportional to its real cost.
Different limits for free, pro, and enterprise — automatically
Most rate-limiting tools box you into a fixed set of dimensions and dropdowns. Zuplo lets you write the rule itself in TypeScript — full access to the API key, request, and your own data — so the limit fits the way your business actually charges. Tier overrides, geo overrides, time-of-day rules, customer-by-customer carve-outs: all real code, all running at the edge.
60 / min
12,480 consumers
600 / min
412 consumers
Custom · 6000+ / min
38 consumers
Allowed (1m)
124,183
Throttled (1m)
842
Monthly caps and multi-meter accounting, in one policy
complex-rate-limit-inbound lets one request burn from multiple counters — tokens, compute units, downstream calls — at amounts you decide. quota-inbound stretches the window from minutes to months for hourly, daily, weekly, and monthly meters. Both run inside the same edge pipeline as your rate limits.
Rate limiting that fits real product shapes
Most gateways stop at "requests per minute per IP." Real products need per-plan, per-token, per-org, per-something-only-you-know.
The rule is yours to write
Most gateways limit you to whatever dimensions their dropdowns offer. Zuplo lets you write the rate-limit rule in TypeScript with full access to the request, the API key, and any data you have — so a tier promotion, a geo carve-out, or a one-off enterprise contract is a few lines of code, not a support ticket.
One pipeline, every kind of limit
Per-second spikes, multi-dimension counters that bill tokens or compute units, monthly contractual quotas — they all attach the same way, share the same auth context, and stack on the same route. Stop running rate limits in one product and quotas in another.
No Redis, no operations
Counter state replicates across the edge automatically. Nothing to provision, no maxmemory eviction policy to tune, no version mismatch between the gateway and a separate cache cluster.
Token-aware for AI workloads
setIncrements lets one request consume multiple units from a counter. "Charge 1500 tokens to the user's monthly quota and 1 request to their per-minute bucket" — one policy block, one source of truth.
What teams use this for
“We need a free tier with strict limits and a paid tier with generous ones.”
Read the customer's plan from their API key metadata and return the matching limit — strict for free, generous for paid, custom for enterprise. New plan? Add a branch. New customer? They inherit their tier's bucket the moment they hit the gateway.
“Our LLM costs are out of control because one prompt = many tokens.”
Charge each request to a token budget instead of a request count. From your handler you tell the gateway how many tokens this request actually used; cheap prompts cost little, expensive ones cost a lot, and the throttle reflects real cost.
“I need a 1,000-call monthly quota that resets on the customer's billing date.”
Set a monthly quota and anchor the reset to each customer's billing day with a custom function. Surface remaining balance in your developer portal so customers always know where they stand before the next invoice.
“Block scrapers but don't slow down legitimate traffic.”
Stack two limits on the same route: a tight per-IP burst limit catches anonymous scrapers, a generous per-API-key limit handles authenticated traffic. Whichever one a request hits first returns a 429 — your real customers never feel it.
Frequently Asked Questions
Common questions about rate limiting with Zuplo.
Throttle the right traffic, not all the traffic
Free Zuplo project, drop in a rate-limit policy, and you're protecting your origin in under 10 minutes.