Why Rate Limiting by IP Breaks Your API

A customer’s app keeps getting 429s, but the dashboard shows their key has barely been used today. You ask them to retry, it works. Two hours later they’re back, same problem.

What you eventually spot: their corporate office shares its public outbound IP with a scraper running out of the same coworking space, or with another tenant on the same office network, or with the entire third floor of a WeWork. Your rate limit sees one IP firing thousands of requests a minute, so it punishes the IP. The customer, the scraper, and the marketing intern running a Postman collection all share the punishment. Nobody can reproduce it from their laptop because their laptop egresses somewhere else.

By the time you’ve been paged the offending traffic has stopped, the IP has rotated, and the support thread is full of “are you sure you’re not on a VPN?” The bug isn’t in your code, it’s in the assumption that an IP address identifies a caller.

Use this approach if you're:

Your gateway rate-limits by source IP and you've shipped to real customers
You've debugged a "ghost" 429 that you couldn't reproduce in-house
You're picking a rate limit key for a new API and IP is the obvious default

IP was never an identity

A source IP answers “where did this packet come from?”, not “who sent it?” Most consumer and corporate traffic on the internet now shares its egress address with a population of strangers, in four overlapping ways.

Carrier-grade NAT and mobile networks

Mobile carriers and many residential ISPs no longer hand subscribers a public IPv4 address. They sit subscribers behind carrier-grade NAT, often inside the RFC 6598 reserved range 100.64.0.0/10, and translate thousands of subscribers onto a much smaller pool of public addresses. Whichever public address comes out the other end is shared by everyone behind that NAT at that moment.

If your API is consumer-facing and your customers are on cellular, you’re rate-limiting groups of unrelated people who happen to be on the same tower. Staging won’t reproduce it; what you’ll actually see is a slow drip of “the app doesn’t work on my phone” tickets that resolve when the user switches to WiFi.

Cloud NAT egress reuse

On the server side the same problem shows up in the cloud egress pool. A request from AWS Lambda or ECS goes out through a NAT gateway with a small pool of egress IPs shared by every workload in the VPC. Google’s Cloud NAT docs are explicit about this: “VMs use a set of shared external IP addresses to connect to the internet.” Cloud Run and Cloud Functions inherit that behaviour when fronted by Cloud NAT.

When two customers host their integrations on the same provider in the same region, their backends share egress, and two unrelated tenants end up on one counter.

Those addresses aren’t stable either: unless a customer pins an Elastic IP to their NAT gateway, the public address is drawn from the provider’s pool and re-issued to a different tenant when the gateway is recreated. The address that belonged to a happy customer last month belongs to someone else’s batch job today. Banning by IP here is banning by coincidence.

IPv6 prefix ambiguity

IPv6 was supposed to fix this and mostly hasn’t. The unit a rate limiter should treat as one caller is the prefix assigned to the subscriber, not the full 128-bit address. A home broadband customer is typically delegated a /56 or /64, and any device behind their router gets a fresh address inside that prefix.

Key on the full /128 and you give a single subscriber thousands of free counters. IPv6 privacy extensions (on by default in most operating systems) cycle the trailing bits every few hours: the same phone might be 2001:db8:1::abc this hour and 2001:db8:1::f00 the next, both inside the same /64, both treated by the limiter as a brand-new caller.

Key on the /64 and the opposite happens: hosting providers may route entire data-centre blocks as a single /64, collapsing a whole region into one counter. No prefix length is right for every network, and the IETF guidance on end-site assignment explicitly leaves the choice to operators.

Tor, VPNs, and shared proxies

Tor exit nodes, commercial VPN providers, and corporate egress proxies concentrate huge populations of users onto small pools of addresses by design. A few of those users are abusing your API, but most are doing what their employer or threat model told them to do. An IP-based rate limit can’t tell them apart, so it either lets abusers through (if generous enough not to break privacy-conscious users) or locks legitimate users out (if tight enough to slow the abusers).

When IP is the right key

Two cases where IP is the best signal you’ve got, and you should use it without apology in both.

Truly unauthenticated endpoints. Signup, password reset, public search, the contact form. There’s no caller identity to fall back on, and an IP-based cap discourages casual scripting. Pair it with a CAPTCHA the first time the threshold trips, so false positives have an escape hatch.

DDoS pre-filtering. A blunt per-IP ceiling, set far above any legitimate caller’s usage, catches the obvious volumetric stuff before it reaches the rest of your pipeline. This isn’t your real rate limit, it’s the moat outside the wall.

For every other endpoint, the right key is the caller, and the caller is something you authenticated.

What to key on instead

In order of preference for an authenticated API:

API key, when the caller is a machine or another system. Two requests with the same key are the same caller, regardless of where they egress from.
Customer or user ID, when the API key belongs to an account with multiple keys. Rate-limit the account, not the key, so rotating a key or issuing a second one doesn’t double a customer’s effective budget.
JWT subject (sub claim), when the caller is an end user behind an OAuth or OIDC token. The subject is stable per user across devices and sessions.
Custom function, when the right key is a composite. A common pattern is “tenant ID for paid plans, IP for the free tier”, computed at request time from the auth context.

Notice what’s not on the list: device or TLS fingerprints. They belong in fraud and abuse pipelines where false positives are tolerable, not in a rate limiter where they reproduce the same shared-key problem IPs already have.

Configure rate limits by caller

Zuplo’s rate-limit-inbound policy takes a rateLimitBy option with four values: user, ip, function, and all. The default is user:

json

{
  "name": "rate-limit-inbound-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}

user reads request.user.sub, the stable per-caller identifier populated by whichever auth policy ran ahead of the rate limiter on the route (API key or JWT). Order matters: auth first, then rate limit, otherwise there’s no sub to key on and every caller collapses into a single shared bucket.

For anything other than the caller ID, switch rateLimitBy to function and write a small handler that returns the bucket key. The example below keys paid callers by tenantId and free callers by source IP. tenantId isn’t a Zuplo concept: it’s a field you stash on the API key’s consumer metadata when you provision the key, identifying which of your customers (or which team inside a customer) the key belongs to. Keying off it means two API keys issued to the same tenant share one bucket, which is usually what billing expects.

// modules/rate-limit-key.ts
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export default function rateLimitKey(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // After auth, request.user.data is whatever the auth policy attached:
  //   - api-key-inbound: the consumer metadata you set on the key
  //   - any JWT policy:  the verified token payload
  const tenantId = request.user?.data?.tenantId;

  if (tenantId) {
    // Paid caller: bucket per tenant.
    return { key: `tenant-${tenantId}` };
  }

  // Free tier: fall back to the source IP. Zuplo runs on Cloudflare,
  // so cf-connecting-ip is the trusted client IP.
  const ip = request.headers.get("cf-connecting-ip");
  return { key: `ip-${ip ?? "unknown"}` };
}

A few things to know about the function:

The auth policy on the route runs first and populates request.user. The function reads from it, but doesn’t authenticate.
request.user.data is typed as unknown at compile time because Zuplo doesn’t know the shape you put on the consumer or the JWT. Narrow it with an interface for type safety; the third snippet below shows the pattern.
key must be a string. The "unknown" literal preserves the type guarantee on the IP fallback rather than letting null slip through.
Returning undefined or null skips the rate limit entirely for that request, useful for internal allow-lists.
A non-string key throws a RuntimeError at request time.

Per-plan caps from a subscription

The cleaner pattern when paid plans are involved is to put monetization-inbound in front of the rate limiter instead of api-key-inbound. The monetization policy validates the API key, checks the consumer’s subscription and payment status, populates request.user with the same { sub, data } shape, and stashes the full subscription record on the request context. One inbound policy, not two.

The rate-limit function then reads the plan key off the subscription and returns a different cap per tier:

// modules/rate-limit-by-plan.ts
import {
  CustomRateLimitDetails,
  MonetizationInboundPolicy,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

const PLAN_LIMITS: Record<
  string,
  { requestsAllowed: number; timeWindowMinutes: number }
> = {
  free: { requestsAllowed: 60, timeWindowMinutes: 1 },
  pro: { requestsAllowed: 1000, timeWindowMinutes: 1 },
  enterprise: { requestsAllowed: 10000, timeWindowMinutes: 1 },
};

export default function rateLimitByPlan(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // monetization stashes the subscription on context, not on request.user
  const subscription = MonetizationInboundPolicy.getSubscriptionData(context);
  const planKey = subscription?.plan.key ?? "free";
  // defensive: an unknown plan slug still resolves to a real cap
  const limits = PLAN_LIMITS[planKey] ?? PLAN_LIMITS.free;

  return {
    // subscription id is stable per contract; sub is the fallback when monetization didn't run; "anonymous" keeps the key a string
    key: `sub-${subscription?.id ?? request.user?.sub ?? "anonymous"}`,
    // per-request override of the policy's default cap, so one policy covers every tier
    requestsAllowed: limits.requestsAllowed,
    timeWindowMinutes: limits.timeWindowMinutes,
  };
}

MonetizationInboundPolicy.getSubscriptionData(context) is the static helper that pulls the subscription the monetization policy stashed earlier in the chain. The function returns both the bucket key and the cap, so one rate-limit-inbound policy on one route handles every plan tier with no per-plan policies and no proliferating config.

Per-customer caps without redeploying

If you’re not running the monetization policy but still want per-customer caps that change without a config redeploy, the same trick works against the API key metadata directly. Stash a requestsPerMinute field on the consumer record and read it from request.user.data:

// modules/rate-limit-from-key-metadata.ts
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

interface ConsumerMetadata {
  tenantId?: string;
  requestsPerMinute?: number;
}

export default function rateLimitFromKeyMetadata(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  // user.data is whatever JSON the customer attached to the consumer; a typed cast is the convention here
  const metadata = request.user?.data as ConsumerMetadata | undefined;
  const tenantId = metadata?.tenantId;

  if (tenantId) {
    return {
      key: `tenant-${tenantId}`,
      // a tenant without an explicit cap still needs one
      requestsAllowed: metadata?.requestsPerMinute ?? 100,
      timeWindowMinutes: 1,
    };
  }

  const ip = request.headers.get("cf-connecting-ip");
  return {
    key: `ip-${ip ?? "unknown"}`,
    // illustrative free-tier cap; pick what fits your product
    requestsAllowed: 10,
    timeWindowMinutes: 1,
  };
}

Edit the consumer metadata on a single API key and the cap changes on the next request without a redeploy, subject only to the API key cache TTL.

ip mode stays available for the two cases above, and the answer to “should this endpoint use it?” is almost always no.

Rate Limiting Policy Reference

Every rateLimitBy mode, the function signature for custom keys, and how the policy reads request.user.sub.

All three examples share one shape: a small TypeScript function that returns a CustomRateLimitDetails. That’s the entire programmable surface. Compose the bucket key from JWT claims, the request path, Cloudflare country headers, feature flags, or any other signal you can read off the request or pull from the context. Override requestsAllowed and timeWindowMinutes per request so one policy serves every tier you ship. Return undefined to skip the limit entirely for an internal allow-list, or branch on whatever your business logic needs. No DSL, no rules engine, no waiting on a roadmap ticket for the option you want, just a function running at the edge with full access to the request and the auth context.

If you have an existing gateway keyed on IP, the migration is one option change per route plus an auth policy ahead of it. The customer whose ticket you couldn’t reproduce stops opening tickets, and the next time someone asks for a per-account exception or a plan tier you didn’t model upfront, the answer is one function edit and a redeploy.