Zuplo
API Best Practices

Best Practices for API Error Handling

Adrian MachadoAdrian Machado
May 28, 2026
13 min read

Design consistent API error responses with HTTP status codes, RFC 9457 Problem Details, error schemas, retry strategies, idempotency, and observability.

Every API will eventually return an error. The question is whether that error helps the developer fix the problem or sends them on a guessing game. Poorly structured error responses — vague messages, wrong status codes, missing context — lead to frustrated developers, wasted debugging hours, and a flood of support tickets.

This guide covers concrete strategies for building consistent, useful API error responses. You will learn how to select the right HTTP status codes, structure error payloads using the RFC 9457 Problem Details standard, handle errors across REST, GraphQL, and gRPC protocols, and implement retry logic, idempotency, and observability patterns that make errors actionable instead of opaque.

Here is what a well-structured error response looks like, following RFC 9457:

JSONjson
{
  "type": "https://api.example.com/errors/invalid-input",
  "title": "Invalid Input Parameters",
  "status": 422,
  "detail": "The 'email' field must use user@domain.com format",
  "instance": "/users/registration/2026-05-28/8042",
  "errors": [
    {
      "detail": "must be a valid email address",
      "pointer": "#/email"
    }
  ]
}

This response tells the developer exactly what went wrong, which field caused the problem, and how to fix it. That is the goal.

Choosing the Right HTTP Status Codes

HTTP status codes are the first signal a client receives about what happened. Using them correctly is table stakes for any API.

4xx codes indicate the client sent a bad request. The client needs to change something before retrying:

  • 400 Bad Request — Malformed syntax, invalid JSON, or missing required fields.
  • 401 Unauthorized — Missing or invalid authentication credentials. (Despite the name, this means “unauthenticated.”)
  • 403 Forbidden — Authenticated, but lacking permission for this operation.
  • 404 Not Found — The requested resource does not exist.
  • 409 Conflict — The request conflicts with the current state of the resource (e.g., a duplicate creation or concurrent update).
  • 422 Unprocessable Content — The request is syntactically valid but fails business validation rules.
  • 429 Too Many Requests — The client has exceeded its rate limit. Always include a Retry-After header.

5xx codes indicate the server failed to fulfill a valid request. These are typically transient and worth retrying:

  • 500 Internal Server Error — An unhandled exception on the server side.
  • 502 Bad Gateway — The server, acting as a proxy, received an invalid upstream response.
  • 503 Service Unavailable — The server is temporarily down for maintenance or overloaded. Include a Retry-After header.
  • 504 Gateway Timeout — The upstream server did not respond in time.

The critical rule: use the most specific status code that applies. Returning 400 for every client error or 500 for every server error strips the response of useful information. If a user’s email fails validation, return 422 with field-level details — not a generic 400.

You can find a full reference of status codes on MDN. For deeper dives on specific codes, see our guides on HTTP 429 Too Many Requests and HTTP 431 Request Header Fields Too Large.

Structuring Error Responses with RFC 9457 Problem Details

Ad-hoc error formats are one of the biggest sources of friction in API integrations. Every API invents its own shape — { "error": "..." }, { "code": 123, "msg": "..." }, { "errors": [{ ... }] } — and consumers have to write custom parsing logic for each one.

RFC 9457 (Problem Details for HTTP APIs) solves this by defining a standard error response format. It is the successor to the widely adopted RFC 7807 and is fully backward compatible. If you are starting fresh, adopt RFC 9457. If you already use RFC 7807, your responses are already compliant.

The Five Standard Fields

A Problem Details response uses the application/problem+json content type and includes these fields (all optional, but all recommended):

  • type (URI) — Identifies the specific error type. When set to "about:blank", the title should match the standard HTTP status phrase.
  • title (string) — A short, human-readable summary. Should remain stable for a given type.
  • status (integer) — The HTTP status code, duplicated in the body for convenience.
  • detail (string) — A human-readable explanation specific to this occurrence.
  • instance (URI) — Identifies the specific request occurrence, useful for log correlation.

Extension Members and Validation Errors

RFC 9457 explicitly supports custom extension members beyond the five standard fields. This is where you add application-specific context. RFC 9457 illustrates how extension members can include an errors array for reporting multiple validation problems in a single response:

JSONjson
{
  "type": "https://api.example.com/errors/validation-error",
  "title": "Your request is not valid.",
  "status": 422,
  "errors": [
    {
      "detail": "must be a positive integer",
      "pointer": "#/age"
    },
    {
      "detail": "must be 'green', 'red', or 'blue'",
      "pointer": "#/profile/color"
    }
  ]
}

Returning all validation errors at once — rather than one at a time — saves developers from the frustrating loop of fix-one-submit-get-another-error.

Here is the full HTTP response, including headers:

http
HTTP/1.1 422 Unprocessable Content
Content-Type: application/problem+json
Content-Language: en

We talked with Erik Wilde, one of the authors of RFC 9457, about the design decisions behind the standard. Check it out here:

Zuplo and Problem Details

If you use an API gateway like Zuplo, Problem Details responses come built in. Zuplo defaults to the RFC 7807 Problem Details format for every error — from authentication failures to rate limit violations to request validation errors. Each response automatically includes a trace extension with a requestId, buildId, and timestamp for debugging.

You can also customize error responses programmatically using the HttpProblems helper from @zuplo/runtime, which provides methods for every HTTP status code (badRequest(), unauthorized(), tooManyRequests(), and so on) with support for custom detail, type, and extension fields.

Writing Clear, Secure Error Messages

A good error message answers three questions: What went wrong? Where did it happen? How do I fix it?

Do this:

JSONjson
{
  "type": "https://api.example.com/errors/invalid-input",
  "title": "Invalid Input Parameters",
  "status": 422,
  "detail": "The 'user_email' field must use user@example.com format",
  "instance": "/api/v1/users"
}

Not this:

JSONjson
{
  "error": "ValidationError: field_23",
  "message": "Check input and try again"
}

The first response tells the developer exactly which field failed and what format is expected. The second is useless.

Security Considerations

Error messages must balance helpfulness with security. Follow these rules:

  • Never expose internal details. Stack traces, database queries, file paths, and internal service names do not belong in API responses.
  • Use neutral authentication messages. Return the same error for “user not found” and “wrong password” to prevent account enumeration attacks. A message like "Invalid credentials" works for both cases.
  • Validate and sanitize inputs before including them in error messages to prevent injection attacks.

An API gateway can act as a safety net here. With a programmable gateway like Zuplo, you can write outbound policies that scan response bodies for sensitive data patterns — internal IP addresses, database connection strings, stack traces — and strip them before they reach the client. See our API security best practices guide for more on securing your API surface.

Error Handling by API Protocol

Different API protocols handle errors differently. Understanding these conventions is important when you build or consume APIs, and especially when you run mixed-protocol architectures behind a single gateway.

REST API Errors

REST APIs communicate errors through HTTP status codes paired with structured error payloads. The RFC 9457 Problem Details format described above is the current best practice for REST error responses. It supports content negotiation between JSON and XML, and its extension mechanism handles everything from simple validation errors to complex multi-step workflow failures.

GraphQL Errors

GraphQL always responds with HTTP 200 OK, even when errors occur. Error information lives in an errors array in the response body. This design enables partial success — some fields can resolve successfully while others fail:

JSONjson
{
  "data": {
    "user": {
      "name": "John Doe",
      "email": null
    }
  },
  "errors": [
    {
      "message": "Not authorized to access email field",
      "locations": [{ "line": 5, "column": 3 }],
      "path": ["user", "email"],
      "extensions": {
        "code": "FORBIDDEN"
      }
    }
  ]
}

The path field pinpoints exactly which data field triggered the error. The extensions.code field provides a machine-readable error code for programmatic handling (e.g., UNAUTHENTICATED, FORBIDDEN, BAD_USER_INPUT).

For domain-specific errors, many GraphQL APIs use union types to make error states part of the schema itself, making every possible outcome explicit and type-safe. For more on GraphQL API design patterns, see our dedicated guide.

gRPC Errors

gRPC uses a fixed set of 17 status codes (0 through 16) defined in the grpc.status package. Every response includes a numeric code and a string message, with optional structured details via the google.rpc package:

JSONjson
{
  "code": 3,
  "message": "Invalid email format in user creation request",
  "details": [
    {
      "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      "reason": "VALIDATION_ERROR",
      "domain": "user-service",
      "metadata": {
        "field": "email",
        "violation": "format"
      }
    }
  ]
}

A key difference from REST: well-formed gRPC responses always use HTTP 200 OK at the transport level. The actual error status is carried in the grpc-status trailer. Here are the most commonly used gRPC status codes and their HTTP equivalents:

  • INVALID_ARGUMENT (3) → 400 Bad Request
  • NOT_FOUND (5) → 404 Not Found
  • PERMISSION_DENIED (7) → 403 Forbidden
  • RESOURCE_EXHAUSTED (8) → 429 Too Many Requests
  • UNIMPLEMENTED (12) → 501 Not Implemented
  • INTERNAL (13) → 500 Internal Server Error
  • UNAVAILABLE (14) → 503 Service Unavailable
  • UNAUTHENTICATED (16) → 401 Unauthorized

For a deeper comparison of these protocols, see our REST or gRPC guide.

Cross-Protocol Consistency

Regardless of which protocol you use, two principles remain constant: machine-readable error codes for programmatic handling and human-readable details for debugging. When you run APIs across multiple protocols, an API gateway can normalize error formats — translating gRPC status codes into Problem Details responses for REST consumers, for example — so that downstream clients get a consistent experience.

Retry Strategies and Backoff Patterns

Not every error is permanent. Transient failures — rate limits, temporary overloads, upstream timeouts — will often succeed on retry. But retrying incorrectly can make things worse, turning a brief spike into a sustained outage.

Which Errors to Retry

Retry these: 429 Too Many Requests, 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout.

Do not retry these: 400, 401, 403, 404, 422 — these are client errors that will fail identically on every attempt. Retrying them wastes resources and can trigger rate limits.

Exponential Backoff with Jitter

The standard retry pattern is exponential backoff: double the wait time after each failed attempt. But naive exponential backoff has a problem. If a server goes down and 1,000 clients fail simultaneously, they will all retry at the same intervals, creating a “thundering herd” that overwhelms the recovering server.

Jitter fixes this by randomizing the delay. Here is a practical implementation:

TypeScripttypescript
async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries = 5,
  baseDelayMs = 1000,
  maxDelayMs = 30000,
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      if (attempt === maxRetries - 1) throw error;

      // Honor Retry-After header if present
      const retryAfter = error.response?.headers?.get("retry-after");
      let delay: number;

      if (retryAfter) {
        delay = isNaN(Number(retryAfter))
          ? new Date(retryAfter).getTime() - Date.now()
          : parseInt(retryAfter) * 1000;
      } else {
        // Exponential backoff with full jitter, capped at maxDelayMs
        const exponentialDelay = Math.min(
          maxDelayMs,
          baseDelayMs * Math.pow(2, attempt),
        );
        delay = Math.random() * exponentialDelay;
      }

      await new Promise((resolve) =>
        setTimeout(resolve, Math.max(0, Math.min(delay, maxDelayMs))),
      );
    }
  }
  throw new Error("Unreachable");
}

Key details in this implementation:

  • Always honor the Retry-After header when present. This header appears in 429 and 503 responses and tells you exactly when to retry, either as seconds (Retry-After: 120) or an HTTP date (Retry-After: Thu, 28 May 2026 14:30:00 GMT).
  • Full jitter (Math.random() * exponentialDelay) provides the best distribution of retry times across clients.
  • Cap your maximum delay and maximum retry count to prevent unbounded waits.

Idempotency Keys for Safe Error Recovery

Retries introduce a second problem: what happens when the request succeeded on the server but the response was lost in transit? Without safeguards, a retried POST /payments could create a duplicate charge.

Idempotency keys solve this. The client generates a unique key (typically a UUID) and sends it with the request. The server uses this key to deduplicate:

Terminalbash
curl -X POST https://api.example.com/v1/charges \
  -H "Authorization: Bearer sk_live_xxx" \
  -H "Idempotency-Key: 7c4a8d09-a910-41d6-b8b2-a7e4c2f1e3d5" \
  -H "Content-Type: application/json" \
  -d '{"amount": 2000, "currency": "usd"}'

How the server handles this:

  1. First request with this key: Process normally, store the key and response.
  2. Subsequent requests with the same key: Return the stored response without reprocessing.
  3. Same key but different parameters: Return an error — the key is bound to the original request payload.

Idempotency keys should have a defined lifetime (24–48 hours is standard) and are only needed for non-idempotent methods like POST. Methods like GET, PUT, and DELETE are naturally idempotent.

For more on implementing this pattern, see our guide to idempotency keys in REST APIs.

Observability and Error Tracking

Returning good error responses is only half the battle. You also need visibility into what is failing, how often, and why.

Correlation IDs

Every API request should carry a unique identifier that flows through your entire system — from the gateway, through backend services, into logs, and back in the error response. This lets you trace a single failed request across multiple services in seconds.

In Zuplo, every request automatically receives a zp-rid (request ID) header. This ID appears in the error response trace object, in gateway logs, and can be forwarded to your backend services for cross-system correlation.

Structured Logging

Log error events with structured data — not just free-text messages. Include:

  • Request ID (correlation ID)
  • HTTP method and path
  • Status code returned
  • Error type and detail
  • Timestamp
  • Client identifier (API key name, user ID — never the raw credential)

Structured logs make it possible to query, aggregate, and alert on error patterns. For example, you can set up alerts when the error rate for a specific endpoint exceeds a threshold, or when a particular client starts hitting validation errors at an unusual rate.

Key Metrics to Track

  • Error rate by endpoint — Which endpoints are failing most?
  • Error rate by status code — Are you seeing more 5xx (your problem) or 4xx (client problems)?
  • P95 error resolution time — How quickly are errors being investigated and fixed?
  • Error recurrence rate — Are the same errors coming back after fixes?

Tools like Sentry and Raygun provide error aggregation with distributed tracing. Zuplo integrates with logging platforms like Datadog, Splunk, Dynatrace, and Google Cloud Logging through its log forwarding plugins, so error data flows directly into your existing observability stack.

For more on API observability tooling, see our guide to API monitoring tools.

Evolving Error Schemas Without Breaking Clients

Error response formats are part of your API contract. Changing them carelessly will break client integrations just like changing your success response schemas would.

When you need to evolve your error format:

  • Add optional fields instead of changing existing ones. Adding a new errors array alongside an existing detail field is safe. Removing detail is not.
  • Preserve legacy formats during transitions. Support both old and new formats simultaneously until clients have migrated.
  • Use semantic versioning for your API so clients can opt in to new error formats at their own pace.
  • Include deprecation notices in response headers when old error formats are being phased out.

For example, adding an errors array alongside an existing detail field is a safe, additive change:

JSONjson
// v1 — original error format
{
  "type": "https://api.example.com/errors/validation-error",
  "title": "Validation Error",
  "status": 422,
  "detail": "The 'email' field is invalid"
}

// v2 — additive change, backward compatible
{
  "type": "https://api.example.com/errors/validation-error",
  "title": "Validation Error",
  "status": 422,
  "detail": "The 'email' field is invalid",
  "errors": [
    { "detail": "must be a valid email address", "pointer": "#/email" }
  ]
}

Existing clients that only read detail continue to work. New clients can parse the errors array for field-level granularity.

An API gateway is particularly useful during error format migrations. You can use outbound policies to transform upstream error responses into your target format, giving you a consistent error contract at the gateway level while your backend services migrate incrementally.

Testing Your Error Handling

Your error handling code deserves the same testing rigor as your happy-path logic. Three approaches work well together:

  1. Schema validation — If you have defined error response schemas in your OpenAPI spec, validate live responses against those schemas. This is called contract testing, and it catches regressions where an error response drifts from its documented format.

  2. Automated end-to-end tests — Write tests that intentionally trigger every error path: invalid inputs, missing authentication, nonexistent resources, rate limits. Tools like Playwright and StepCI work well for end-to-end API testing.

  3. Mock API simulation — Use mock APIs to simulate error scenarios like network timeouts, 500 errors, and malformed responses. This lets you verify your client-side error handling without depending on a live backend.

If you use Zuplo’s request validation policy, your API will automatically reject requests that do not match your OpenAPI schema and return a Problem Details response with field-level error pointers — no custom validation code required.

Summary

Good API error handling is not about catching every edge case. It is about building a consistent, predictable system that helps developers understand and recover from failures quickly.

The key practices covered in this guide:

  • Use specific HTTP status codes422 for validation errors, 429 for rate limits, 503 for temporary outages. Never overload 400 or 500.
  • Adopt RFC 9457 Problem Details — A standard format with type, title, status, detail, and instance fields, plus extension members for application-specific context.
  • Follow protocol conventions — REST uses status codes and response bodies, GraphQL uses errors arrays, gRPC uses its own status codes.
  • Implement retry logic correctly — Exponential backoff with jitter for transient errors, and always honor Retry-After headers.
  • Use idempotency keys — Prevent duplicate side effects when retrying mutating operations.
  • Invest in observability — Correlation IDs, structured logging, and error-rate monitoring turn error data into actionable insights.

If you are looking for a platform that handles many of these patterns out of the box — Problem Details responses, request validation, rate limiting, and log forwarding — try Zuplo for free.

Try Zuplo free

Try the platform behind this guide

Zuplo is a developer-first API gateway. Deploy your first API in minutes — no credit card required.

  • 100K requests/mo free
  • GitOps deploys
  • 300+ edge locations

Try Zuplo free — 100K requests/mo

Start free