A customer’s app keeps getting 429s, but the dashboard shows their key has
barely been used today. You ask them to retry, it works. Two hours later they’re
back, same problem.
What you eventually spot: their corporate office shares its public outbound IP with a scraper running out of the same coworking space, or with another tenant on the same office network, or with the entire third floor of a WeWork. Your rate limit sees one IP firing thousands of requests a minute, so it punishes the IP. The customer, the scraper, and the marketing intern running a Postman collection all share the punishment. Nobody can reproduce it from their laptop because their laptop egresses somewhere else.
By the time you’ve been paged the offending traffic has stopped, the IP has rotated, and the support thread is full of “are you sure you’re not on a VPN?” The bug isn’t in your code, it’s in the assumption that an IP address identifies a caller.
- Your gateway rate-limits by source IP and you've shipped to real customers
- You've debugged a "ghost" 429 that you couldn't reproduce in-house
- You're picking a rate limit key for a new API and IP is the obvious default
IP was never an identity
A source IP answers “where did this packet come from?”, not “who sent it?” Most consumer and corporate traffic on the internet now shares its egress address with a population of strangers, in four overlapping ways.
Carrier-grade NAT and mobile networks
Mobile carriers and many residential ISPs no longer hand subscribers a public
IPv4 address. They sit subscribers behind carrier-grade NAT, often inside the
RFC 6598 reserved range 100.64.0.0/10, and translate thousands of
subscribers onto a much smaller pool of public addresses. Whichever public
address comes out the other end is shared by everyone behind that NAT at that
moment.
If your API is consumer-facing and your customers are on cellular, you’re rate-limiting groups of unrelated people who happen to be on the same tower. Staging won’t reproduce it; what you’ll actually see is a slow drip of “the app doesn’t work on my phone” tickets that resolve when the user switches to WiFi.
Cloud NAT egress reuse
On the server side the same problem shows up in the cloud egress pool. A request from AWS Lambda or ECS goes out through a NAT gateway with a small pool of egress IPs shared by every workload in the VPC. Google’s Cloud NAT docs are explicit about this: “VMs use a set of shared external IP addresses to connect to the internet.” Cloud Run and Cloud Functions inherit that behaviour when fronted by Cloud NAT.
When two customers host their integrations on the same provider in the same region, their backends share egress, and two unrelated tenants end up on one counter.
Those addresses aren’t stable either: unless a customer pins an Elastic IP to their NAT gateway, the public address is drawn from the provider’s pool and re-issued to a different tenant when the gateway is recreated. The address that belonged to a happy customer last month belongs to someone else’s batch job today. Banning by IP here is banning by coincidence.
IPv6 prefix ambiguity
IPv6 was supposed to fix this and mostly hasn’t. The unit a rate limiter should
treat as one caller is the prefix assigned to the subscriber, not the full
128-bit address. A home broadband customer is typically delegated a /56 or
/64, and any device behind their router gets a fresh address inside that
prefix.
Key on the full /128 and you give a single subscriber thousands of free
counters. IPv6 privacy extensions (on by default in most operating systems)
cycle the trailing bits every few hours: the same phone might be
2001:db8:1::abc this hour and 2001:db8:1::f00 the next, both inside the same
/64, both treated by the limiter as a brand-new caller.
Key on the /64 and the opposite happens: hosting providers may route entire
data-centre blocks as a single /64, collapsing a whole region into one
counter. No prefix length is right for every network, and the IETF guidance on
end-site assignment explicitly leaves the choice to operators.
Tor, VPNs, and shared proxies
Tor exit nodes, commercial VPN providers, and corporate egress proxies concentrate huge populations of users onto small pools of addresses by design. A few of those users are abusing your API, but most are doing what their employer or threat model told them to do. An IP-based rate limit can’t tell them apart, so it either lets abusers through (if generous enough not to break privacy-conscious users) or locks legitimate users out (if tight enough to slow the abusers).
When IP is the right key
Two cases where IP is the best signal you’ve got, and you should use it without apology in both.
Truly unauthenticated endpoints. Signup, password reset, public search, the contact form. There’s no caller identity to fall back on, and an IP-based cap discourages casual scripting. Pair it with a CAPTCHA the first time the threshold trips, so false positives have an escape hatch.
DDoS pre-filtering. A blunt per-IP ceiling, set far above any legitimate caller’s usage, catches the obvious volumetric stuff before it reaches the rest of your pipeline. This isn’t your real rate limit, it’s the moat outside the wall.
For every other endpoint, the right key is the caller, and the caller is something you authenticated.
What to key on instead
In order of preference for an authenticated API:
- API key, when the caller is a machine or another system. Two requests with the same key are the same caller, regardless of where they egress from.
- Customer or user ID, when the API key belongs to an account with multiple keys. Rate-limit the account, not the key, so rotating a key or issuing a second one doesn’t double a customer’s effective budget.
- JWT subject (
subclaim), when the caller is an end user behind an OAuth or OIDC token. The subject is stable per user across devices and sessions. - Custom function, when the right key is a composite. A common pattern is “tenant ID for paid plans, IP for the free tier”, computed at request time from the auth context.
Notice what’s not on the list: device or TLS fingerprints. They belong in fraud and abuse pipelines where false positives are tolerable, not in a rate limiter where they reproduce the same shared-key problem IPs already have.
Configure rate limits by caller
Zuplo’s rate-limit-inbound policy takes a rateLimitBy option with
four values: user, ip, function, and all. The default is user:
user reads request.user.sub, the stable per-caller identifier populated by
whichever auth policy ran ahead of the rate limiter on the route (API
key or JWT). Order matters: auth first, then rate limit,
otherwise there’s no sub to key on and every caller collapses into a single
shared bucket.
For anything other than the caller ID, switch rateLimitBy to function and
write a small handler that returns the bucket key. The example below keys paid
callers by tenantId and free callers by source IP. tenantId isn’t a Zuplo
concept: it’s a field you stash on the API key’s consumer metadata when you
provision the key, identifying which of your customers (or which team inside a
customer) the key belongs to. Keying off it means two API keys issued to the
same tenant share one bucket, which is usually what billing expects.
A few things to know about the function:
- The auth policy on the route runs first and populates
request.user. The function reads from it, but doesn’t authenticate. request.user.datais typed asunknownat compile time because Zuplo doesn’t know the shape you put on the consumer or the JWT. Narrow it with an interface for type safety; the third snippet below shows the pattern.keymust be a string. The"unknown"literal preserves the type guarantee on the IP fallback rather than lettingnullslip through.- Returning
undefinedornullskips the rate limit entirely for that request, useful for internal allow-lists. - A non-string
keythrows aRuntimeErrorat request time.
Per-plan caps from a subscription
The cleaner pattern when paid plans are involved is to put
monetization-inbound in front of the rate limiter instead of
api-key-inbound. The monetization policy validates the API key, checks the
consumer’s subscription and payment status, populates request.user with the
same { sub, data } shape, and stashes the full subscription record on the
request context. One inbound policy, not two.
The rate-limit function then reads the plan key off the subscription and returns a different cap per tier:
MonetizationInboundPolicy.getSubscriptionData(context) is the static helper
that pulls the subscription the monetization policy stashed earlier in the
chain. The function returns both the bucket key and the cap, so one
rate-limit-inbound policy on one route handles every plan tier with no
per-plan policies and no proliferating config.
Per-customer caps without redeploying
If you’re not running the monetization policy but still want per-customer caps
that change without a config redeploy, the same trick works against the API key
metadata directly. Stash a requestsPerMinute field on the consumer record and
read it from request.user.data:
Edit the consumer metadata on a single API key and the cap changes on the next request without a redeploy, subject only to the API key cache TTL.
ip mode stays available for the two cases above, and the answer to “should
this endpoint use it?” is almost always no.
Rate Limiting Policy Reference
Every rateLimitBy mode, the function signature for custom keys, and how the policy reads request.user.sub.
All three examples share one shape: a small TypeScript function that returns a
CustomRateLimitDetails. That’s the entire programmable surface. Compose the
bucket key from JWT claims, the request path, Cloudflare country headers,
feature flags, or any other signal you can read off the request or pull from the
context. Override requestsAllowed and timeWindowMinutes per request so one
policy serves every tier you ship. Return undefined to skip the limit entirely
for an internal allow-list, or branch on whatever your business logic needs. No
DSL, no rules engine, no waiting on a roadmap ticket for the option you want,
just a function running at the edge with full access to the request and the auth
context.
If you have an existing gateway keyed on IP, the migration is one option change per route plus an auth policy ahead of it. The customer whose ticket you couldn’t reproduce stops opening tickets, and the next time someone asks for a per-account exception or a plan tier you didn’t model upfront, the answer is one function edit and a redeploy.
