--- title: "How API Metering, Features and Quota Enforcement Work" description: "Welcome to API Monetization 101! Learn how API metering and quota enforcement work: meters, features, hard vs soft limits, and how to enforce usage at the gateway." canonicalUrl: "https://zuplo.com/blog/2026/02/24/api-monetization-metering-and-enforcement" pageType: "blog" date: "2026-02-24" authors: "martyn" tags: "API Monetization 101" image: "https://zuplo.com/og?text=API%20Monetization%20101%3A%20Metering%20%26%20Enforcement" --- API monetization has more moving parts than it looks like from the outside. Meters, features, plans, subscriptions, entitlements, enforcement. This series breaks it down, starting with the foundation: tracking usage and acting on it. ## What Metering Actually Means Metering is counting things. But "things" is doing a lot of work in that sentence. The obvious answer is API requests. Customer makes a call, you increment a counter. This works fine for straightforward APIs where every request costs you roughly the same amount to serve. But what if you're wrapping an LLM? A request that generates 50 tokens and one that generates 4,000 tokens aren't the same. Charging per request penalizes your lightweight users and subsidizes the heavy ones. Or what if you're serving files? A 1KB JSON response and a 500MB video download shouldn't count the same way. This is why modern metering systems don't just count requests. They track **usage dimensions**: the specific unit that correlates with your cost to serve or your value delivered. ## Three Common Metering Patterns **Request counting** is the baseline. Every API call increments a counter by one. Use this when your requests are roughly uniform in cost, or when simplicity matters more than precision. ```json { "slug": "api_requests", "name": "API Requests", "eventType": "api_request", "aggregation": "COUNT" } ``` **Token metering** is essential for AI applications. Your backend calls the model, gets a token count back, and reports it. Now you can price like OpenAI does: per thousand tokens, with different rates for input and output if you want to get granular. ```json { "slug": "tokens_total", "name": "Token Usage", "eventType": "completion", "aggregation": "SUM", "valueProperty": "$.tokens" } ``` **Data transfer metering** tracks bytes. Useful for file storage APIs, CDNs, or any service where bandwidth is a real cost. ```json { "slug": "data_transfer", "name": "Data Transfer (bytes)", "eventType": "response", "aggregation": "SUM", "valueProperty": "$.bytes" } ``` The key insight: a meter watches for a specific event type, extracts a numeric value using a JSONPath expression, and aggregates it. You control what events you send and what values they contain. The metering system just does the math. ## From Meters to Features Meters track raw usage. But customers don't buy meters. They buy features: "10,000 API calls per month" or "1 million tokens included." A **feature** connects a meter to your product catalog. It's the thing you put on your pricing page and enforce at the gateway. Features come in two flavors. **Metered features** link to a meter. When you include a metered feature in a plan, you can set quotas: how much usage is included, whether there's a hard limit or soft limit, what happens on overage. ```json { "key": "api_calls", "name": "API Calls", "meterSlug": "api_requests" } ``` **Static features** have no meter. They're boolean: you either have access or you don't. Use these for capabilities that aren't about consumption. Priority support. Access to premium endpoints. Beta features. ```json { "key": "priority_support", "name": "Priority Support" } ``` The distinction matters for enforcement, which we'll get to next. ## Enforcement: Where Metering Meets Access Control Usage data is valuable, but the real question is what happens when a customer hits their limit. Enforcement is where metering connects to your API gateway. When a request comes in, the enforcement layer checks: 1. Does this customer have an active subscription? 2. Do they have an entitlement for the meters this endpoint requires? 3. Is their balance sufficient (for metered features)? 4. Is their payment current? If any check fails, the request is rejected before it reaches your backend. This is important: enforcement happens at the gateway, not in your application code. ## Hard Limits vs Soft Limits When a customer exhausts their quota, you have two options. **Hard limits** block the request. The customer gets a 429 or 402, and their integration stops working until the next billing cycle or until they upgrade. This protects you from runaway usage but creates a harsh experience. **Soft limits** allow the request but flag it as overage. The customer keeps working, and you bill them for the extra usage at the end of the period. This is friendlier but requires you to handle customers who rack up charges they can't or won't pay. The right choice depends on your business model. Soft limits work well when you trust your customers and want to maximize usage. Hard limits make sense when you need predictable costs or when your customers are developers who expect strict quotas. Most mature API products offer both: hard limits on free tiers, soft limits with overage billing on paid plans. ## What Gets Checked The enforcement policy examines the meters configured for each route. If your endpoint requires the `api_requests` meter, the policy checks whether the customer's subscription includes an entitlement for that meter and whether they have remaining balance. This means different endpoints can require different meters. A single request can also consume multiple meters: one API call that also transfers data, for example. The policy also checks payment status. Overdue invoice? Expired subscription? The request is blocked at the edge, not after your backend has done the work. Note that static features (boolean entitlements) require separate enforcement logic since they're not meter-based. ## Designing Your Metering Strategy Before you create meters, think about what you're actually selling. **If you're selling access**, request counting is probably fine. Your value is the API itself, and customers pay for the privilege of calling it. **If you're selling compute**, meter the compute. For LLM wrappers, that's tokens. For image processing, maybe it's pixels or processing time. For search, it might be documents scanned. **If you're selling data**, meter the data. Bytes transferred, records returned, storage consumed. **If you're selling a combination**, use multiple meters. A plan might include 10,000 API calls AND 1 million tokens AND 10GB of transfer. Each gets its own meter, its own entitlement, its own limit. The goal is alignment: your meter should track the thing that costs you money to provide or the thing your customer values receiving. When those align, pricing feels fair to everyone. ## What's next? That the foundation covered: meters track usage, features connect them to your product, enforcement acts on the limits. In the next part of this series, we'll cover plans and phases: how to structure pricing tiers, free trials, and automatic transitions.