Rate Limiting Without the Rage: A 2026 Guide That Developers Won't Hate

Let's be honest: rate limiting has a reputation problem.

Developers hate hitting rate limits. They hate the cryptic error messages. They hate the guessing game of "how long until I can try again?" They hate feeling punished for using an API they're paying for.

And they're right to hate it—because most rate limiting is implemented badly.

But rate limiting isn't optional if you're monetizing an API. You need it to enforce plan limits, protect infrastructure, prevent abuse, and ensure fair access. The question isn't whether to rate limit—it's how to do it without making your users want to throw their laptop out the window.

Let's build rate limiting that developers actually respect.

The Four Rate Limiting Algorithms (And When to Use Each)

Before we talk about implementation, you need to understand your options:

1. Fixed Window

The simplest approach: "100 requests per minute, counter resets on the minute."

text

┌────────────────────┐┌────────────────────┐
│   Minute 1: 100    ││   Minute 2: 100    │
│   requests OK      ││   requests OK      │
└────────────────────┘└────────────────────┘

Pros: Easy to understand, easy to implement, predictable for users Cons: "Thundering herd" at window boundaries—users can do 100 requests at 11:59:59 and 100 more at 12:00:00

Best for: Simple APIs where burst behavior is acceptable

2. Sliding Window

Smooths the fixed window by looking at a rolling time period.

text

     100 requests allowed in any rolling 60-second period
     ┌─────────────────────────────────────────────────┐
←────│ Now - 60s                                  Now │
     └─────────────────────────────────────────────────┘

Pros: Prevents window boundary gaming, more consistent enforcement Cons: More complex to implement, harder for users to predict

Best for: APIs where consistent throughput matters more than burst allowance

3. Token Bucket

Users have a "bucket" of tokens. Each request consumes one. Tokens refill at a steady rate.

text

Bucket: 100 tokens max, refills at 10/second

Time 0:   [██████████████████████████████████████] 100 tokens
Time 1:   Burst 50 requests → [████████████████████] 50 tokens
Time 2:   +10 refilled → [██████████████████████] 60 tokens
Time 3:   Burst 30 requests → [██████████████] 30 tokens

Pros: Allows bursts while enforcing average rate, intuitive "budget" mental model Cons: Users need to understand token economics

Best for: APIs where occasional bursts are acceptable but sustained high volume isn't

4. Leaky Bucket

Requests queue up and process at a steady rate—like water leaking from a bucket.

text

         ┌───┐
Requests │   │ Queue (max size = burst allowance)
   ──────►   │────────►  Steady output
         │   │           (e.g., 10/sec)
         └───┘

Pros: Perfectly smooth output rate, protects downstream services Cons: Introduces latency (requests queue instead of executing immediately)

Best for: When you need to protect a fixed-capacity downstream system

Pro tip:

In 2026, token bucket is winning. It's the most intuitive for developers (think of it like a spending budget) and balances burst tolerance with sustained rate control. Unless you have specific requirements, start here.

The New Hotness: Points-Based Rate Limiting

Simple request counting is becoming obsolete. The problem: not all requests are equal.

A request that returns 10 items is cheaper than one returning 10,000 items. A read operation is cheaper than a write. A cached response is cheaper than one requiring database queries.

Enter points-based rate limiting, pioneered by companies like Atlassian. Each request consumes "points" based on actual resource usage:

typescript

// Points-based rate limit configuration
const endpointCosts = {
  "GET /users/:id": 1, // Single item read
  "GET /users": 10, // List endpoint
  "POST /users": 5, // Write operation
  "GET /analytics/report": 50, // Heavy computation
  "POST /batch/process": 100, // Batch operation
};

// Rate limit: 1000 points per minute

This approach:

Aligns costs with actual infrastructure impact
Discourages expensive operations without blocking them
Rewards efficient API usage patterns

With an API gateway like Zuplo, you can implement this using the complex rate limiting policy, which lets you define named counters and dynamically set their increments per request:

typescript

import {
  ComplexRateLimitInboundPolicy,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

const endpointCosts: Record<string, number> = {
  "GET /v1/users/:id": 1,
  "GET /v1/users": 10,
  "POST /v1/users": 5,
  "GET /v1/analytics/report": 50,
  "POST /v1/batch/process": 100,
};

export default async function (request: ZuploRequest, context: ZuploContext) {
  const route = `${request.method} ${context.route.path}`;
  const cost = endpointCosts[route] ?? 1;

  // Override the "points" counter increment for this request
  ComplexRateLimitInboundPolicy.setIncrements(context, { points: cost });

  return request;
}

Error Responses That Don't Suck

Here's where most APIs fail: the 429 response. A typical bad implementation:

http

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"error": "Rate limit exceeded"}

This tells developers nothing. They have to guess when they can retry, how many requests they have left, and what limit they hit.

Here's what a good 429 looks like:

http

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 32
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706968800
X-RateLimit-Policy: 100;w=60

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You've exceeded the rate limit of 100 requests per minute",
    "details": {
      "limit": 100,
      "window": "60s",
      "reset_at": "2026-02-03T12:00:00Z",
      "retry_after_seconds": 32
    },
    "docs_url": "https://api.example.com/docs/rate-limits"
  }
}

The essential headers (these are standardized—use them):

Header	Purpose
`Retry-After`	Seconds until they can retry
`X-RateLimit-Limit`	Total requests allowed in window
`X-RateLimit-Remaining`	Requests left in current window
`X-RateLimit-Reset`	Unix timestamp when window resets

Common mistake:

The biggest rate limiting mistake? Not returning rate limit headers on successful requests. Developers need to see their remaining quota on every response so they can manage their usage proactively, not just when they've already failed.

Rate Limits as Product Feature

Here's the mindset shift: rate limits aren't just protection—they're product differentiation.

Plan	Rate Limit	Monthly Price	$/request
Free	10/min	$0	—
Starter	100/min	$29	Pennies
Pro	1,000/min	$199	Cheaper
Enterprise	10,000/min	$999+	Cheapest

Rate limits create urgency to upgrade. When a customer consistently hits their 100/min limit, the sales conversation is easy: "You're hitting limits. Want 10x capacity?"

This only works if you:

Surface usage data prominently in dashboards
Send proactive alerts before limits are hit
Make upgrading frictionless (one-click plan change)

typescript

// Alert when approaching limit
if (usagePercent > 80) {
  await sendEmail({
    template: "approaching_rate_limit",
    data: {
      current_usage: usage,
      limit: limit,
      upgrade_url: `https://portal.example.com/upgrade`,
    },
  });
}

Graceful Degradation: The Art of Being Nice

Hard rate limits—where you return 429 and block the request—are sometimes necessary. But for many scenarios, graceful degradation is better:

Strategy 1: Slow down, don't stop

Instead of blocking, add latency as users approach limits:

typescript

// Progressive slowdown near limits
// Helper: const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));
const usagePercent = currentUsage / limit;

if (usagePercent > 0.9) {
  await new Promise((resolve) => setTimeout(resolve, 1000)); // 1 second delay
} else if (usagePercent > 0.8) {
  await new Promise((resolve) => setTimeout(resolve, 500)); // 0.5 second delay
}

// Process request (it still works, just slower)

Users experience degradation as slowness rather than failure. This is often acceptable where hard failures aren't.

Strategy 2: Reduce fidelity

Return less data instead of failing:

typescript

if (isRateLimited(user)) {
  return {
    data: truncateResponse(fullData, 10), // Only 10 items
    meta: {
      truncated: true,
      reason: "rate_limit_active",
      full_data_available_at: resetTime,
    },
  };
}

Strategy 3: Queue instead of reject

For non-time-sensitive operations, accept the request and process it later:

typescript

if (isRateLimited(user)) {
  const jobId = await queue.add({
    request: request,
    user: user,
    priority: "normal",
  });

  return {
    status: "queued",
    job_id: jobId,
    estimated_completion: "< 5 minutes",
    webhook_url: user.webhookUrl,
  };
}

Multi-Tier Rate Limiting

Real-world APIs need multiple rate limit layers:

text

┌─────────────────────────────────────────────────────┐
│ Global: 10,000 req/sec (protects infrastructure)    │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Per-Customer: 1,000 req/min (plan enforcement)  │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Per-Endpoint: 100 req/min (expensive ops)   │ │ │
│ │ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ │ Per-IP: 60 req/min (abuse prevention)   │ │ │ │
│ │ │ └─────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Each layer serves a different purpose:

Global limits protect your infrastructure from total overload
Per-customer limits enforce plan tiers and prevent one customer from affecting others
Per-endpoint limits protect expensive operations from abuse
Per-IP limits prevent credential stuffing and brute force attacks

When a request is blocked, tell the user which limit they hit:

json

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "limit_type": "per_endpoint",
    "endpoint": "/analytics/generate-report",
    "message": "This endpoint is limited to 10 requests per hour"
  }
}

Implementation: The Zuplo Way

Building production-grade rate limiting from scratch is surprisingly complex. You need:

Distributed counters (rate limits must work across multiple servers)
Efficient storage (Redis, not your primary database)
Low-latency lookups (you're adding latency to every request)
Edge deployment (limit as close to users as possible)

Modern API gateways handle this for you:

json

{
  "name": "rate-limit-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 1000,
      "timeWindowMinutes": 1
    }
  }
}

That's it. The gateway handles distributed counting, header injection, and 429 responses automatically.

Rate Limiting Deep Dive

Learn how to implement sophisticated rate limiting with Zuplo's built-in policies—token bucket, sliding window, and tiered limits.

Multiple algorithmsPer-user & per-IPCustom headers

The Psychology of Rate Limits

Here's a secret: how rate limits feel matters as much as the actual numbers.

Two approaches with identical limits:

Approach A (feels punitive):

Limit: 100 requests/minute
Error message: "Rate limit exceeded"
Reset: silent, users have to guess

Approach B (feels supportive):

Limit: 100 requests/minute
Error message: "You've used your quota quickly! Here's when it resets."
Reset: countdown shown in dashboard
Bonus: email notification at 80% usage

Same limits. Completely different developer experience.

The companies winning on developer experience invest in:

Transparency: Always show current usage and limits
Predictability: Same behavior every time
Communication: Warn before failure, not just after
Self-service: Easy upgrade path when limits don't fit

Checklist: Rate Limiting Done Right

Before you ship, verify you've got these:

Programmable and Dynamic Rate Limiting

Static rate limits are a starting point, but production APIs almost always need limits that adapt to context. The subscriber on a free plan should not get the same throughput as the enterprise customer paying six figures. A lightweight read endpoint should not share the same budget as a heavy analytics export. And if your traffic patterns shift between business hours and off-peak windows, your limits should be able to shift with them.

This is where programmable rate limiting shines. In Zuplo, you set rateLimitBy to "function" in the rate limit policy and point it at a custom module. That module exports a function that receives each request and returns a CustomRateLimitDetails object — the key to bucket on, the number of requests allowed, and the time window. Here are three patterns that come up constantly in production systems.

Tier-Based Limits

The most common dynamic pattern ties rate limits to the consumer's subscription tier. The logic reads a claim from the authenticated user context and returns the corresponding limit:

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

const limitsPerTier: Record<string, number> = {
  free: 10,
  starter: 100,
  pro: 1000,
  enterprise: 10000,
};

export function rateLimitByTier(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const tier = request.user?.data?.tier ?? "free";

  return {
    key: request.user.sub,
    requestsAllowed: limitsPerTier[tier] ?? limitsPerTier.free,
    timeWindowMinutes: 1,
  };
}

With this approach, upgrading a customer's rate limit is as simple as changing their tier in your identity provider or API key metadata. No redeployment, no config file changes, no downtime.

Endpoint-Specific Limits

Not every route costs the same to serve. A cached lookup by ID is orders of magnitude cheaper than an aggregation query that scans millions of rows. You can assign each route its own limit to protect expensive operations without penalizing lightweight ones:

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

const endpointLimits: Record<string, { requests: number; windowMin: number }> =
  {
    "GET /v1/users/:id": { requests: 500, windowMin: 1 },
    "GET /v1/users": { requests: 50, windowMin: 1 },
    "POST /v1/reports/generate": { requests: 5, windowMin: 60 },
    "GET /v1/search": { requests: 30, windowMin: 1 },
  };

export function rateLimitByEndpoint(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const route = `${request.method} ${context.route.path}`;
  const config = endpointLimits[route] ?? { requests: 100, windowMin: 1 };

  return {
    key: `${request.user.sub}:${route}`,
    requestsAllowed: config.requests,
    timeWindowMinutes: config.windowMin,
  };
}

The key trick here is including the route in the rate limit key. That way each endpoint has its own independent counter rather than sharing a single global bucket per user.

Time-of-Day Limits

Some APIs see predictable traffic spikes during business hours and relative quiet overnight. You can give consumers more headroom during off-peak windows to encourage them to shift batch workloads to times when your infrastructure is underutilized:

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

export function rateLimitByTimeOfDay(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const hour = new Date().getUTCHours();
  const isPeak = hour >= 13 && hour <= 21; // 9 AM–5 PM US Eastern in UTC

  return {
    key: request.user.sub,
    requestsAllowed: isPeak ? 100 : 500,
    timeWindowMinutes: 1,
  };
}

You can of course combine all three patterns. A single rate limit handler can read the user's tier, look up the endpoint cost, check the time of day, and compute a final limit that accounts for all three factors. That is the power of having real code, not just a configuration toggle, sitting in the request path.

For a side-by-side look at how different API platforms support these kinds of programmable rate limiting capabilities, see our API rate limiting platform comparison.

Conclusion

Rate limiting is where monetization meets developer experience. Do it well, and you protect your infrastructure while guiding customers toward upgrades. Do it poorly, and you create frustrated developers who blame your API for their problems.

The difference isn't in the algorithms—it's in the execution. Communicate clearly. Degrade gracefully. Make limits visible and upgrades easy.

Rate limiting doesn't have to create rage. It can create revenue.

Your move.

Tags:#API Rate Limiting #API Best Practices #API Monetization

Let's be honest: rate limiting has a reputation problem.

And they're right to hate it—because most rate limiting is implemented badly.

Let's build rate limiting that developers actually respect.

The Four Rate Limiting Algorithms (And When to Use Each)

Before we talk about implementation, you need to understand your options:

1. Fixed Window

The simplest approach: "100 requests per minute, counter resets on the minute."

text

┌────────────────────┐┌────────────────────┐
│   Minute 1: 100    ││   Minute 2: 100    │
│   requests OK      ││   requests OK      │
└────────────────────┘└────────────────────┘

Pros: Easy to understand, easy to implement, predictable for users Cons: "Thundering herd" at window boundaries—users can do 100 requests at 11:59:59 and 100 more at 12:00:00

Best for: Simple APIs where burst behavior is acceptable

2. Sliding Window

Smooths the fixed window by looking at a rolling time period.

text

     100 requests allowed in any rolling 60-second period
     ┌─────────────────────────────────────────────────┐
←────│ Now - 60s                                  Now │
     └─────────────────────────────────────────────────┘

Pros: Prevents window boundary gaming, more consistent enforcement Cons: More complex to implement, harder for users to predict

Best for: APIs where consistent throughput matters more than burst allowance

3. Token Bucket

Users have a "bucket" of tokens. Each request consumes one. Tokens refill at a steady rate.

text

Bucket: 100 tokens max, refills at 10/second

Time 0:   [██████████████████████████████████████] 100 tokens
Time 1:   Burst 50 requests → [████████████████████] 50 tokens
Time 2:   +10 refilled → [██████████████████████] 60 tokens
Time 3:   Burst 30 requests → [██████████████] 30 tokens

Pros: Allows bursts while enforcing average rate, intuitive "budget" mental model Cons: Users need to understand token economics

Best for: APIs where occasional bursts are acceptable but sustained high volume isn't

4. Leaky Bucket

Requests queue up and process at a steady rate—like water leaking from a bucket.

text

         ┌───┐
Requests │   │ Queue (max size = burst allowance)
   ──────►   │────────►  Steady output
         │   │           (e.g., 10/sec)
         └───┘

Pros: Perfectly smooth output rate, protects downstream services Cons: Introduces latency (requests queue instead of executing immediately)

Best for: When you need to protect a fixed-capacity downstream system

Pro tip:

The New Hotness: Points-Based Rate Limiting

Simple request counting is becoming obsolete. The problem: not all requests are equal.

A request that returns 10 items is cheaper than one returning 10,000 items. A read operation is cheaper than a write. A cached response is cheaper than one requiring database queries.

Enter points-based rate limiting, pioneered by companies like Atlassian. Each request consumes "points" based on actual resource usage:

typescript

// Points-based rate limit configuration
const endpointCosts = {
  "GET /users/:id": 1, // Single item read
  "GET /users": 10, // List endpoint
  "POST /users": 5, // Write operation
  "GET /analytics/report": 50, // Heavy computation
  "POST /batch/process": 100, // Batch operation
};

// Rate limit: 1000 points per minute

This approach:

Aligns costs with actual infrastructure impact
Discourages expensive operations without blocking them
Rewards efficient API usage patterns

With an API gateway like Zuplo, you can implement this using the complex rate limiting policy, which lets you define named counters and dynamically set their increments per request:

typescript

import {
  ComplexRateLimitInboundPolicy,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

const endpointCosts: Record<string, number> = {
  "GET /v1/users/:id": 1,
  "GET /v1/users": 10,
  "POST /v1/users": 5,
  "GET /v1/analytics/report": 50,
  "POST /v1/batch/process": 100,
};

export default async function (request: ZuploRequest, context: ZuploContext) {
  const route = `${request.method} ${context.route.path}`;
  const cost = endpointCosts[route] ?? 1;

  // Override the "points" counter increment for this request
  ComplexRateLimitInboundPolicy.setIncrements(context, { points: cost });

  return request;
}

Error Responses That Don't Suck

Here's where most APIs fail: the 429 response. A typical bad implementation:

http

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"error": "Rate limit exceeded"}

This tells developers nothing. They have to guess when they can retry, how many requests they have left, and what limit they hit.

Here's what a good 429 looks like:

http

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 32
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706968800
X-RateLimit-Policy: 100;w=60

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You've exceeded the rate limit of 100 requests per minute",
    "details": {
      "limit": 100,
      "window": "60s",
      "reset_at": "2026-02-03T12:00:00Z",
      "retry_after_seconds": 32
    },
    "docs_url": "https://api.example.com/docs/rate-limits"
  }
}

The essential headers (these are standardized—use them):

Header	Purpose
`Retry-After`	Seconds until they can retry
`X-RateLimit-Limit`	Total requests allowed in window
`X-RateLimit-Remaining`	Requests left in current window
`X-RateLimit-Reset`	Unix timestamp when window resets

Common mistake:

Rate Limits as Product Feature

Here's the mindset shift: rate limits aren't just protection—they're product differentiation.

Plan	Rate Limit	Monthly Price	$/request
Free	10/min	$0	—
Starter	100/min	$29	Pennies
Pro	1,000/min	$199	Cheaper
Enterprise	10,000/min	$999+	Cheapest

Rate limits create urgency to upgrade. When a customer consistently hits their 100/min limit, the sales conversation is easy: "You're hitting limits. Want 10x capacity?"

This only works if you:

Surface usage data prominently in dashboards
Send proactive alerts before limits are hit
Make upgrading frictionless (one-click plan change)

typescript

// Alert when approaching limit
if (usagePercent > 80) {
  await sendEmail({
    template: "approaching_rate_limit",
    data: {
      current_usage: usage,
      limit: limit,
      upgrade_url: `https://portal.example.com/upgrade`,
    },
  });
}

Graceful Degradation: The Art of Being Nice

Hard rate limits—where you return 429 and block the request—are sometimes necessary. But for many scenarios, graceful degradation is better:

Strategy 1: Slow down, don't stop

Instead of blocking, add latency as users approach limits:

typescript

// Progressive slowdown near limits
// Helper: const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));
const usagePercent = currentUsage / limit;

if (usagePercent > 0.9) {
  await new Promise((resolve) => setTimeout(resolve, 1000)); // 1 second delay
} else if (usagePercent > 0.8) {
  await new Promise((resolve) => setTimeout(resolve, 500)); // 0.5 second delay
}

// Process request (it still works, just slower)

Users experience degradation as slowness rather than failure. This is often acceptable where hard failures aren't.

Strategy 2: Reduce fidelity

Return less data instead of failing:

typescript

if (isRateLimited(user)) {
  return {
    data: truncateResponse(fullData, 10), // Only 10 items
    meta: {
      truncated: true,
      reason: "rate_limit_active",
      full_data_available_at: resetTime,
    },
  };
}

Strategy 3: Queue instead of reject

For non-time-sensitive operations, accept the request and process it later:

typescript

if (isRateLimited(user)) {
  const jobId = await queue.add({
    request: request,
    user: user,
    priority: "normal",
  });

  return {
    status: "queued",
    job_id: jobId,
    estimated_completion: "< 5 minutes",
    webhook_url: user.webhookUrl,
  };
}

Multi-Tier Rate Limiting

Real-world APIs need multiple rate limit layers:

text

┌─────────────────────────────────────────────────────┐
│ Global: 10,000 req/sec (protects infrastructure)    │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Per-Customer: 1,000 req/min (plan enforcement)  │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Per-Endpoint: 100 req/min (expensive ops)   │ │ │
│ │ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ │ Per-IP: 60 req/min (abuse prevention)   │ │ │ │
│ │ │ └─────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Each layer serves a different purpose:

Global limits protect your infrastructure from total overload
Per-customer limits enforce plan tiers and prevent one customer from affecting others
Per-endpoint limits protect expensive operations from abuse
Per-IP limits prevent credential stuffing and brute force attacks

When a request is blocked, tell the user which limit they hit:

json

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "limit_type": "per_endpoint",
    "endpoint": "/analytics/generate-report",
    "message": "This endpoint is limited to 10 requests per hour"
  }
}

Implementation: The Zuplo Way

Building production-grade rate limiting from scratch is surprisingly complex. You need:

Distributed counters (rate limits must work across multiple servers)
Efficient storage (Redis, not your primary database)
Low-latency lookups (you're adding latency to every request)
Edge deployment (limit as close to users as possible)

Modern API gateways handle this for you:

json

{
  "name": "rate-limit-policy",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 1000,
      "timeWindowMinutes": 1
    }
  }
}

That's it. The gateway handles distributed counting, header injection, and 429 responses automatically.

Rate Limiting Deep Dive

Learn how to implement sophisticated rate limiting with Zuplo's built-in policies—token bucket, sliding window, and tiered limits.

Multiple algorithmsPer-user & per-IPCustom headers

The Psychology of Rate Limits

Here's a secret: how rate limits feel matters as much as the actual numbers.

Two approaches with identical limits:

Approach A (feels punitive):

Limit: 100 requests/minute
Error message: "Rate limit exceeded"
Reset: silent, users have to guess

Approach B (feels supportive):

Limit: 100 requests/minute
Error message: "You've used your quota quickly! Here's when it resets."
Reset: countdown shown in dashboard
Bonus: email notification at 80% usage

Same limits. Completely different developer experience.

The companies winning on developer experience invest in:

Transparency: Always show current usage and limits
Predictability: Same behavior every time
Communication: Warn before failure, not just after
Self-service: Easy upgrade path when limits don't fit

Checklist: Rate Limiting Done Right

Before you ship, verify you've got these:

Programmable and Dynamic Rate Limiting

Tier-Based Limits

The most common dynamic pattern ties rate limits to the consumer's subscription tier. The logic reads a claim from the authenticated user context and returns the corresponding limit:

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

const limitsPerTier: Record<string, number> = {
  free: 10,
  starter: 100,
  pro: 1000,
  enterprise: 10000,
};

export function rateLimitByTier(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const tier = request.user?.data?.tier ?? "free";

  return {
    key: request.user.sub,
    requestsAllowed: limitsPerTier[tier] ?? limitsPerTier.free,
    timeWindowMinutes: 1,
  };
}

With this approach, upgrading a customer's rate limit is as simple as changing their tier in your identity provider or API key metadata. No redeployment, no config file changes, no downtime.

Endpoint-Specific Limits

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

const endpointLimits: Record<string, { requests: number; windowMin: number }> =
  {
    "GET /v1/users/:id": { requests: 500, windowMin: 1 },
    "GET /v1/users": { requests: 50, windowMin: 1 },
    "POST /v1/reports/generate": { requests: 5, windowMin: 60 },
    "GET /v1/search": { requests: 30, windowMin: 1 },
  };

export function rateLimitByEndpoint(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const route = `${request.method} ${context.route.path}`;
  const config = endpointLimits[route] ?? { requests: 100, windowMin: 1 };

  return {
    key: `${request.user.sub}:${route}`,
    requestsAllowed: config.requests,
    timeWindowMinutes: config.windowMin,
  };
}

The key trick here is including the route in the rate limit key. That way each endpoint has its own independent counter rather than sharing a single global bucket per user.

Time-of-Day Limits

typescript

import {
  CustomRateLimitDetails,
  ZuploRequest,
  ZuploContext,
} from "@zuplo/runtime";

export function rateLimitByTimeOfDay(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const hour = new Date().getUTCHours();
  const isPeak = hour >= 13 && hour <= 21; // 9 AM–5 PM US Eastern in UTC

  return {
    key: request.user.sub,
    requestsAllowed: isPeak ? 100 : 500,
    timeWindowMinutes: 1,
  };
}

For a side-by-side look at how different API platforms support these kinds of programmable rate limiting capabilities, see our API rate limiting platform comparison.

Conclusion

The difference isn't in the algorithms—it's in the execution. Communicate clearly. Degrade gracefully. Make limits visible and upgrades easy.

Rate limiting doesn't have to create rage. It can create revenue.

Your move.

Tags:#API Rate Limiting #API Best Practices #API Monetization

The Four Rate Limiting Algorithms (And When to Use Each)

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

The New Hotness: Points-Based Rate Limiting

Error Responses That Don't Suck

Rate Limits as Product Feature

Graceful Degradation: The Art of Being Nice

Strategy 1: Slow down, don't stop

Strategy 2: Reduce fidelity

Strategy 3: Queue instead of reject

Multi-Tier Rate Limiting

Implementation: The Zuplo Way

Rate Limiting Deep Dive

The Psychology of Rate Limits

Checklist: Rate Limiting Done Right

Programmable and Dynamic Rate Limiting

Tier-Based Limits

Endpoint-Specific Limits

Time-of-Day Limits

Conclusion

Related Articles

How to Implement API Key Authentication: A Complete Guide

Developer Portal Comparison: Customization, Documentation, and Self-Service

The Four Rate Limiting Algorithms (And When to Use Each)

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

The New Hotness: Points-Based Rate Limiting

Error Responses That Don't Suck

Rate Limits as Product Feature

Graceful Degradation: The Art of Being Nice

Strategy 1: Slow down, don't stop

Strategy 2: Reduce fidelity

Strategy 3: Queue instead of reject

Multi-Tier Rate Limiting

Implementation: The Zuplo Way

Rate Limiting Deep Dive

The Psychology of Rate Limits

Checklist: Rate Limiting Done Right

Programmable and Dynamic Rate Limiting

Tier-Based Limits

Endpoint-Specific Limits

Time-of-Day Limits

Conclusion

Related Articles

How to Implement API Key Authentication: A Complete Guide

Developer Portal Comparison: Customization, Documentation, and Self-Service