---
title: "Why Rate Limiting by IP Breaks Your API"
description: "Carrier-grade NAT, cloud egress reuse, and shared proxies put unrelated callers behind one IP address, so an IP-based rate limit punishes the wrong people. Rate-limit by API key or JWT subject with Zuplo's rate-limit-inbound policy instead."
canonicalUrl: "https://zuplo.com/blog/2026/05/13/dont-rate-limit-by-ip"
pageType: "blog"
date: "2026-05-13"
authors: "martyn"
tags: "rate-limiting, api-gateway"
image: "https://zuplo.com/og?text=Why%20Rate%20Limiting%20by%20IP%20Breaks%20Your%20API"
---
A customer's app keeps getting `429`s, but the dashboard shows their key has
barely been used today. You ask them to retry, it works. Two hours later they're
back, same problem.

What you eventually spot: their corporate office shares its public outbound IP
with a scraper running out of the same coworking space, or with another tenant
on the same office network, or with the entire third floor of a WeWork. Your
rate limit sees one IP firing thousands of requests a minute, so it punishes the
IP. The customer, the scraper, and the marketing intern running a Postman
collection all share the punishment. Nobody can reproduce it from their laptop
because their laptop egresses somewhere else.

By the time you've been paged the offending traffic has stopped, the IP has
rotated, and the support thread is full of "are you sure you're not on a VPN?"
The bug isn't in your code, it's in the assumption that an IP address identifies
a caller.

<CalloutAudience
  variant="useIf"
  items={[
    `Your gateway rate-limits by source IP and you've shipped to real customers`,
    `You've debugged a "ghost" 429 that you couldn't reproduce in-house`,
    `You're picking a rate limit key for a new API and IP is the obvious default`,
  ]}
/>

## IP was never an identity

A source IP answers "where did this packet come from?", not "who sent it?" Most
consumer and corporate traffic on the internet now shares its egress address
with a population of strangers, in four overlapping ways.

### Carrier-grade NAT and mobile networks

Mobile carriers and many residential ISPs no longer hand subscribers a public
IPv4 address. They sit subscribers behind carrier-grade NAT, often inside the
[RFC 6598][rfc6598] reserved range `100.64.0.0/10`, and translate thousands of
subscribers onto a much smaller pool of public addresses. Whichever public
address comes out the other end is shared by everyone behind that NAT at that
moment.

If your API is consumer-facing and your customers are on cellular, you're
rate-limiting groups of unrelated people who happen to be on the same tower.
Staging won't reproduce it; what you'll actually see is a slow drip of "the app
doesn't work on my phone" tickets that resolve when the user switches to WiFi.

### Cloud NAT egress reuse

On the server side the same problem shows up in the cloud egress pool. A request
from AWS Lambda or ECS goes out through a NAT gateway with a small pool of
egress IPs shared by every workload in the VPC. Google's [Cloud NAT
docs][gcpnat] are explicit about this: "VMs use a set of shared external IP
addresses to connect to the internet." Cloud Run and Cloud Functions inherit
that behaviour when fronted by Cloud NAT.

When two customers host their integrations on the same provider in the same
region, their backends share egress, and two unrelated tenants end up on one
counter.

Those addresses aren't stable either: unless a customer pins an [Elastic
IP][awseip] to their NAT gateway, the public address is drawn from the
provider's pool and re-issued to a different tenant when the gateway is
recreated. The address that belonged to a happy customer last month belongs to
someone else's batch job today. Banning by IP here is banning by coincidence.

### IPv6 prefix ambiguity

IPv6 was supposed to fix this and mostly hasn't. The unit a rate limiter should
treat as one caller is the _prefix_ assigned to the subscriber, not the full
128-bit address. A home broadband customer is typically delegated a `/56` or
`/64`, and any device behind their router gets a fresh address inside that
prefix.

Key on the full `/128` and you give a single subscriber thousands of free
counters. IPv6 privacy extensions (on by default in most operating systems)
cycle the trailing bits every few hours: the same phone might be
`2001:db8:1::abc` this hour and `2001:db8:1::f00` the next, both inside the same
`/64`, both treated by the limiter as a brand-new caller.

Key on the `/64` and the opposite happens: hosting providers may route entire
data-centre blocks as a single `/64`, collapsing a whole region into one
counter. No prefix length is right for every network, and the [IETF guidance on
end-site assignment][rfc6177] explicitly leaves the choice to operators.

### Tor, VPNs, and shared proxies

Tor exit nodes, commercial VPN providers, and corporate egress proxies
concentrate huge populations of users onto small pools of addresses by design. A
few of those users are abusing your API, but most are doing what their employer
or threat model told them to do. An IP-based rate limit can't tell them apart,
so it either lets abusers through (if generous enough not to break
privacy-conscious users) or locks legitimate users out (if tight enough to slow
the abusers).

## When IP _is_ the right key

Two cases where IP is the best signal you've got, and you should use it without
apology in both.

**Truly unauthenticated endpoints.** Signup, password reset, public search, the
contact form. There's no caller identity to fall back on, and an IP-based cap
discourages casual scripting. Pair it with a CAPTCHA the first time the
threshold trips, so false positives have an escape hatch.

**DDoS pre-filtering.** A blunt per-IP ceiling, set far above any legitimate
caller's usage, catches the obvious volumetric stuff before it reaches the rest
of your pipeline. This isn't your real rate limit, it's the moat outside the
wall.

For every other endpoint, the right key is the caller, and the caller is
something you authenticated.

## What to key on instead

In order of preference for an authenticated API:

- **API key**, when the caller is a machine or another system. Two requests with
  the same key are the same caller, regardless of where they egress from.
- **Customer or user ID**, when the API key belongs to an account with multiple
  keys. Rate-limit the account, not the key, so rotating a key or issuing a
  second one doesn't double a customer's effective budget.
- **JWT subject (`sub` claim)**, when the caller is an end user behind an OAuth
  or OIDC token. The subject is stable per user across devices and sessions.
- **Custom function**, when the right key is a composite. A common pattern is
  "tenant ID for paid plans, IP for the free tier", computed at request time
  from the auth context.

Notice what's not on the list: device or TLS fingerprints. They belong in fraud
and abuse pipelines where false positives are tolerable, not in a rate limiter
where they reproduce the same shared-key problem IPs already have.

## Configure rate limits by caller

Zuplo's [`rate-limit-inbound`][rli] policy takes a `rateLimitBy` option with
four values: `user`, `ip`, `function`, and `all`. The default is `user`:

```json
{
  "name": "rate-limit-inbound-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}
```

`user` reads `request.user.sub`, the stable per-caller identifier populated by
whichever auth policy ran ahead of the rate limiter on the route ([API
key][apikey] or [JWT][jwt]). Order matters: auth first, then rate limit,
otherwise there's no `sub` to key on and every caller collapses into a single
shared bucket.

For anything other than the caller ID, switch `rateLimitBy` to `function` and
write a small handler that returns the bucket key. The example below keys paid
callers by `tenantId` and free callers by source IP. `tenantId` isn't a Zuplo
concept: it's a field _you_ stash on the API key's consumer metadata when you
provision the key, identifying which of your customers (or which team inside a
customer) the key belongs to. Keying off it means two API keys issued to the
same tenant share one bucket, which is usually what billing expects.

```ts
// modules/rate-limit-key.ts
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export default function rateLimitKey(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // After auth, request.user.data is whatever the auth policy attached:
  //   - api-key-inbound: the consumer metadata you set on the key
  //   - any JWT policy:  the verified token payload
  const tenantId = request.user?.data?.tenantId;

  if (tenantId) {
    // Paid caller: bucket per tenant.
    return { key: `tenant-${tenantId}` };
  }

  // Free tier: fall back to the source IP. Zuplo runs on Cloudflare,
  // so cf-connecting-ip is the trusted client IP.
  const ip = request.headers.get("cf-connecting-ip");
  return { key: `ip-${ip ?? "unknown"}` };
}
```

A few things to know about the function:

- The auth policy on the route runs first and populates `request.user`. The
  function reads from it, but doesn't authenticate.
- `request.user.data` is typed as `unknown` at compile time because Zuplo
  doesn't know the shape you put on the consumer or the JWT. Narrow it with an
  interface for type safety; the third snippet below shows the pattern.
- `key` must be a string. The `"unknown"` literal preserves the type guarantee
  on the IP fallback rather than letting `null` slip through.
- Returning `undefined` or `null` skips the rate limit entirely for that
  request, useful for internal allow-lists.
- A non-string `key` throws a `RuntimeError` at request time.

### Per-plan caps from a subscription

The cleaner pattern when paid plans are involved is to put
[`monetization-inbound`][monetization] in front of the rate limiter instead of
`api-key-inbound`. The monetization policy validates the API key, checks the
consumer's subscription and payment status, populates `request.user` with the
same `{ sub, data }` shape, and stashes the full subscription record on the
request context. One inbound policy, not two.

The rate-limit function then reads the plan key off the subscription and returns
a different cap per tier:

```ts
// modules/rate-limit-by-plan.ts
import {
  CustomRateLimitDetails,
  MonetizationInboundPolicy,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

const PLAN_LIMITS: Record<
  string,
  { requestsAllowed: number; timeWindowMinutes: number }
> = {
  free: { requestsAllowed: 60, timeWindowMinutes: 1 },
  pro: { requestsAllowed: 1000, timeWindowMinutes: 1 },
  enterprise: { requestsAllowed: 10000, timeWindowMinutes: 1 },
};

export default function rateLimitByPlan(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // monetization stashes the subscription on context, not on request.user
  const subscription = MonetizationInboundPolicy.getSubscriptionData(context);
  const planKey = subscription?.plan.key ?? "free";
  // defensive: an unknown plan slug still resolves to a real cap
  const limits = PLAN_LIMITS[planKey] ?? PLAN_LIMITS.free;

  return {
    // subscription id is stable per contract; sub is the fallback when monetization didn't run; "anonymous" keeps the key a string
    key: `sub-${subscription?.id ?? request.user?.sub ?? "anonymous"}`,
    // per-request override of the policy's default cap, so one policy covers every tier
    requestsAllowed: limits.requestsAllowed,
    timeWindowMinutes: limits.timeWindowMinutes,
  };
}
```

`MonetizationInboundPolicy.getSubscriptionData(context)` is the static helper
that pulls the subscription the monetization policy stashed earlier in the
chain. The function returns both the bucket key and the cap, so one
`rate-limit-inbound` policy on one route handles every plan tier with no
per-plan policies and no proliferating config.

### Per-customer caps without redeploying

If you're not running the monetization policy but still want per-customer caps
that change without a config redeploy, the same trick works against the API key
metadata directly. Stash a `requestsPerMinute` field on the consumer record and
read it from `request.user.data`:

```ts
// modules/rate-limit-from-key-metadata.ts
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

interface ConsumerMetadata {
  tenantId?: string;
  requestsPerMinute?: number;
}

export default function rateLimitFromKeyMetadata(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // user.data is whatever JSON the customer attached to the consumer; a typed cast is the convention here
  const metadata = request.user?.data as ConsumerMetadata | undefined;
  const tenantId = metadata?.tenantId;

  if (tenantId) {
    return {
      key: `tenant-${tenantId}`,
      // a tenant without an explicit cap still needs one
      requestsAllowed: metadata?.requestsPerMinute ?? 100,
      timeWindowMinutes: 1,
    };
  }

  const ip = request.headers.get("cf-connecting-ip");
  return {
    key: `ip-${ip ?? "unknown"}`,
    // illustrative free-tier cap; pick what fits your product
    requestsAllowed: 10,
    timeWindowMinutes: 1,
  };
}
```

Edit the consumer metadata on a single API key and the cap changes on the next
request without a redeploy, subject only to the API key cache TTL.

`ip` mode stays available for the two cases above, and the answer to "should
this endpoint use it?" is almost always no.

<CalloutDoc
  title="Rate Limiting Policy Reference"
  description="Every rateLimitBy mode, the function signature for custom keys, and how the policy reads request.user.sub."
  href="https://zuplo.com/docs/policies/rate-limit-inbound"
  icon="book"
/>

All three examples share one shape: a small TypeScript function that returns a
`CustomRateLimitDetails`. That's the entire programmable surface. Compose the
bucket key from JWT claims, the request path, Cloudflare country headers,
feature flags, or any other signal you can read off the request or pull from the
context. Override `requestsAllowed` and `timeWindowMinutes` per request so one
policy serves every tier you ship. Return `undefined` to skip the limit entirely
for an internal allow-list, or branch on whatever your business logic needs. No
DSL, no rules engine, no waiting on a roadmap ticket for the option you want,
just a function running at the edge with full access to the request and the auth
context.

If you have an existing gateway keyed on IP, the migration is one option change
per route plus an auth policy ahead of it. The customer whose ticket you
couldn't reproduce stops opening tickets, and the next time someone asks for a
per-account exception or a plan tier you didn't model upfront, the answer is one
function edit and a redeploy.

[rfc6598]: https://datatracker.ietf.org/doc/html/rfc6598
[rfc6177]: https://datatracker.ietf.org/doc/html/rfc6177
[rli]: https://zuplo.com/docs/policies/rate-limit-inbound
[apikey]: https://zuplo.com/docs/policies/api-key-inbound
[jwt]: https://zuplo.com/docs/policies/open-id-jwt-auth-inbound
[awseip]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html
[gcpnat]: https://cloud.google.com/nat/docs/overview
[monetization]: https://zuplo.com/docs/policies/monetization-inbound