---
title: "Hard Limits, Soft Limits, and Progressive Friction for Monetized APIs"
description: "Surprise 429s break customer apps. Invisible usage climbs break customer budgets. Progressive friction is the third pattern most API teams land on, and it handles both by adding visibility: warn early, slow at the edge, stop only at runaway."
canonicalUrl: "https://zuplo.com/blog/2026/04/24/progressive-friction-for-monetized-apis"
pageType: "blog"
date: "2026-04-24"
authors: "martyn"
tags: "API Monetization"
image: "https://zuplo.com/og?text=Hard%20Limits%2C%20Soft%20Limits%2C%20and%20Progressive%20Friction%20for%20Monetized%20APIs"
---
Plan quota enforcement loses you customers in two mirror-image ways.

**Option A, hard-limit failure.** A developer wakes up at 3am to production
429s. Their app crossed its monthly plan quota overnight, and nothing warned
them.

**Option B, soft-limit failure.** The same developer signed up for overage
pricing with eyes open, so the bill arriving isn't the issue. What is: nothing
surfaced their usage climbing through the month, so the invoice landing at five
times their budget is the first they heard about it.

These are the same product bug in different clothes, and neither is a config
problem. It's a design problem, and most teams make the call before they've
thought about it.

The default framing is binary: hard (block at the threshold and return 429)
versus soft (pass the request and bill the overage). There is a third pattern,
better than either, and it's where experienced API teams end up after their
first postmortem involving one of the scenarios above.

Every team I've watched land on progressive friction got there the hard way,
usually after an end-of-month support thread that started with a developer
asking why their app "just stopped working" at 2am.

<CalloutAudience
  variant="useIf"
  items={[
    `You charge for API access on a monthly plan with a request budget`,
    `You've had a customer hit a 429 they didn't see coming, or an invoice they didn't see building`,
    `You're designing the first paid tier above your free plan and aren't sure which way to jump`,
  ]}
/>

## The three patterns

A **rate card** in Zuplo is the per-plan price sheet, and each line on it is an
**entitlement**, a feature on the plan. Entitlements come in four flavours:
Metered (tracks usage against a monthly allowance), Boolean (on/off), Static (a
fixed config value like "max 5 webhooks"), and No entitlement (feature isn't on
this plan). This post is about the Metered kind, because the other three don't
have a quota to overflow.

**Hard limit.** The request is blocked at the threshold and a 429 is returned.
Predictable and blunt, it's the right call for free tiers, for abuse prevention,
and for any entitlement where "one more request" is cheaper for you to refuse
than to serve.

**Soft limit.** The request passes, and every call above the included allowance
is billed at a per-unit rate, usually via graduated tiered pricing. This is a
revenue-positive setup when the customer expects the bill, and a support
incident when they don't, because you're no longer monetizing, you're surprising
them.

**Progressive friction.** Layered enforcement that escalates across thresholds:
a warning at 80% via the developer portal and an email to the owner, induced
latency at 95%+, soft overage billing at 100%, and a hard cutoff only much
higher (say 200% of plan) to cap runaway cost. The customer knows the ceiling is
coming, has time to upgrade, and the app never falls off a cliff in production.

## Implementation with Zuplo

The Monetization API is there for teams managing plans as code, but the portal
is the faster path.

### Hard limit

Use the **Free** pricing model. Set the entitlement to **Metered (track usage)**
with a usage limit of, say, 20 requests per month. No soft-limit toggle appears
because there's no overage tier to bill against. Hit the limit, you're blocked.

![Free plan rate card with a Metered entitlement and a 20-request monthly usage limit](/blog-images/progressive-friction-for-monetized-apis/free-plan-rate-card.png)

### Soft limit

Use a **Tiered** pricing model with **Graduated** price mode. Two tiers: the
first from 0 to your included allowance at $0, the second from allowance+1 to ∞
at the per-unit overage rate. Once the overage tier exists, the **Soft limit**
toggle appears. On means requests past the allowance are billed at the overage
rate. Off means they're blocked. Same monetization-inbound policy, one UI
switch.

![Tiered rate card with graduated pricing and the Soft limit toggle enabled](/blog-images/progressive-friction-for-monetized-apis/tiered-plan-rate-card.png)

<CalloutTip variant="tip">
  Soft limit **off** doesn't mean "no limit." It means requests past the
  allowance are blocked, even though an overage tier is configured. The toggle
  is the single switch between "bill them" and "block them," so it's worth
  confirming the state before you publish a rate card.
</CalloutTip>

<CalloutDoc
  title="Rate Cards Reference"
  description="Full reference for rate cards, entitlement types, the Soft limit toggle, and the pricing models they pair with."
  href="https://zuplo.com/docs/articles/monetization/rate-cards"
  icon="book"
/>

### Progressive friction

Start from the soft-limit configuration above, then add a `custom-code-inbound`
policy to the route. `custom-code-inbound` is Zuplo's "drop in your own
TypeScript" escape hatch: the file lives in `modules/`, and
`config/policies.json` points at it by export name.

Order it after the monetization policy in the inbound pipeline. The monetization
policy attaches the customer's subscription to the request context, so the code
that runs next can read it and add friction when usage crosses a threshold. This
is the same pattern the
[monetization-inbound policy docs](https://zuplo.com/docs/policies/monetization-inbound)
describe as a soft-limit example, with a latency step added.

![Zuplo inbound policy pipeline showing monetization-inbound, then apply-progressive-friction, then set-user-headers, ordered before the URL Forward handler](/blog-images/progressive-friction-for-monetized-apis/inbound-policy-pipeline.png)

```ts
import {
  ZuploContext,
  ZuploRequest,
  MonetizationInboundPolicy,
} from "@zuplo/runtime";

export default async function (request: ZuploRequest, context: ZuploContext) {
  const subscription = MonetizationInboundPolicy.getSubscriptionData(context);
  const entitlement = subscription?.entitlements?.["api_requests"];
  // No metered entitlement on this plan, nothing to slow down.
  if (!entitlement?.balance) return request;

  const used = entitlement.usage / entitlement.balance;
  if (used < 0.95) return request;

  await new Promise((r) => setTimeout(r, 2000));
  // Inbound headers are immutable, so clone the request to add one.
  const warned = new ZuploRequest(request);
  warned.headers.set(
    "X-Usage-Warning",
    `${Math.round(used * 100)}% of plan used`,
  );
  return warned;
}
```

On the subscription object, `"api_requests"` is the meter name you set on the
rate card, `balance` is the allowance granted for the period (not the remaining
amount, a naming quirk worth flagging), and `usage` is how much has been
consumed. The `usage` value lags the current request by one increment because it
reflects backend state, which is fine for a threshold like "slow down at 95%."

The delay is doing most of the work here, not the header. Induced latency shows
up in the developer's own logs, dashboards, and alerts, where a response header
won't. The `X-Usage-Warning` header pairs with the delay rather than replacing
it, and the stock `rate-limit-inbound` policy only emits `Retry-After`, so if
you want the client to see their remaining balance, that's on your custom code.

The snippet only implements the 95% slowdown because that's the representative
step. All three thresholds live in the same policy, each as its own `if` block.
The shape stays the same across them, what changes is the action per threshold.

At 80%, you'd add an outbound HTTP call to whichever transactional email
provider you already use: [Resend](https://resend.com/),
[Twilio SendGrid](https://sendgrid.com/),
[Cloudflare's new Email Service](https://blog.cloudflare.com/email-service/),
anything with a send API.

At 200%, a second `if` block returns a 429 and cuts the customer off. Usage this
far over has stopped looking like a friction problem and started looking like an
incident.

<CalloutTip variant="tip">
  The snippet shows the shape, not a production implementation. Four places
  worth hardening before it sees real traffic:

- **Scale the delay with usage.** A flat 2-second `setTimeout` holds a worker
  slot per slowed request. Under a burst, those stack up and add latency to
  requests that weren't even over the threshold.
- **Move thresholds and the meter name into policy options.** Otherwise tweaking
  "slow down at 95%" to "slow down at 90%" means a code change and a redeploy.
- **Log via `context.log` when friction fires.** A customer opening a "my app
  feels slow" ticket is much easier to diagnose if you can see whether friction
  was the cause.
- **Use the IETF `RateLimit` draft header instead of a custom `X-*`.** Any
  client library that understands the standard can back off automatically,
  instead of every caller having to learn your bespoke header.

</CalloutTip>

<CalloutDoc
  title="Monetization Policy Reference"
  description="Full reference for the monetization-inbound policy, the documented soft-limit example this pattern extends, and the subscription data model it exposes."
  href="https://zuplo.com/docs/policies/monetization-inbound"
  icon="book"
/>

## Where each fits

- **Hard limits** for free tiers, abuse prevention, and any entitlement where
  the marginal cost of an extra call is too high to eat.
- **Soft limits** for enterprise contracts, customers with payment on file and
  spending forecasts, and any call whose value exceeds the marginal cost.
- **Progressive friction** as the default for paid plans, where the goal is
  customers who upgrade rather than churn or rage-quit.

## Design, not default

A 429 in production at 3am and an invoice landing at five times budget are two
shapes of the same failure: the gateway had the usage data and didn't surface it
in time for anyone to act on it. That makes the gateway the right place to close
the gap, because it's the only thing that sees usage in real time, before the
backend does and before the customer does. Progressive friction is that
visibility turned into signals the customer can respond to.

Same logic as why Zuplo's `meterOnStatusCodes` defaults to `"200-299"`, failed
requests shouldn't count against a customer's quota. Quota enforcement is a
sibling design decision, and one worth making deliberately rather than by
default.