Zuplo
gRPC

gRPC API Gateway: Protocol Translation, Load Balancing, and Observability

Nate TottenNate Totten
March 16, 2026
14 min read

Learn how API gateways handle gRPC traffic — from protocol translation and load balancing to authentication, observability, and gRPC-Web support.

gRPC has become the default choice for high-performance microservices communication. Its binary serialization with Protocol Buffers, native HTTP/2 transport, and built-in streaming make it significantly faster than REST for internal service-to-service calls. But the moment you need to expose gRPC services to external consumers, browsers, or mobile clients, you run into a wall of compatibility challenges.

That is where an API gateway comes in. A gRPC API gateway sits between your clients and your gRPC backends, handling protocol translation, load balancing, authentication, and observability — so your services can speak gRPC internally while remaining accessible to the outside world.

Contents

Why gRPC Needs a Gateway

gRPC works beautifully inside a controlled environment where every client and server speaks the same protocol. The problems start when you step outside that boundary:

  • Browsers cannot call gRPC directly — The browser’s Fetch API does not expose the low-level HTTP/2 framing that gRPC requires. There is no way to force HTTP/2 or access raw HTTP/2 frames from JavaScript. This means web applications need a translation layer to interact with gRPC services.
  • External consumers expect REST/JSON — Most third-party developers, partners, and legacy systems are built around REST conventions. Asking them to adopt gRPC, Protocol Buffers, and code generation creates unnecessary friction for API adoption.
  • Load balancing breaks with HTTP/2 — gRPC multiplexes many requests over a single long-lived TCP connection. Traditional Layer 4 (L4) load balancers distribute connections, not individual requests, which means all traffic from one client can land on a single backend pod while others sit idle.
  • Cross-cutting concerns need a centralized layer — Authentication, rate limiting, logging, and metrics should not be duplicated in every gRPC service. A gateway enforces these policies consistently across your entire API surface.

For a deeper comparison of when to use each protocol, see REST or gRPC? A Guide to Efficient API Design.

gRPC-to-REST Transcoding

The most common gateway pattern for gRPC is transcoding: automatically translating between HTTP/JSON requests and gRPC calls. This lets you maintain a single gRPC service definition while serving both gRPC and REST clients.

How Transcoding Works

Transcoding relies on annotations in your .proto files that map gRPC methods to HTTP endpoints. The standard approach uses google.api.http annotations defined in Google’s AIP-127 specification:

protobuf
syntax = "proto3";

import "google/api/annotations.proto";

service UserService {
  rpc GetUser(GetUserRequest) returns (User) {
    option (google.api.http) = {
      get: "/v1/users/{user_id}"
    };
  }

  rpc CreateUser(CreateUserRequest) returns (User) {
    option (google.api.http) = {
      post: "/v1/users"
      body: "*"
    };
  }
}

When a client sends GET /v1/users/123, the gateway translates this into a GetUser gRPC call with user_id set to 123. The gRPC response is serialized back to JSON and returned to the client. The client never needs to know that gRPC is involved.

Transcoding Implementations

Several tools handle gRPC-JSON transcoding, each with different trade-offs:

  • Envoy’s gRPC-JSON Transcoder filter — Envoy reads your protobuf descriptor set and performs transcoding inline as a filter. This is the most common approach in Kubernetes environments where Envoy is already the data plane proxy.
  • grpc-gateway — An open-source Go project that generates a reverse-proxy server from your annotated .proto files. It produces a standalone Go HTTP server that translates REST calls to gRPC. This is a popular choice in Go ecosystems.
  • ASP.NET Core gRPC JSON transcoding — Microsoft’s built-in transcoding for .NET applications. It runs inside the same ASP.NET Core process as your gRPC service, avoiding the overhead of a separate proxy.
  • Google Cloud Endpoints — Google’s managed service that provides transcoding for gRPC services deployed on GCP, using the same google.api.http annotations.

For a hands-on walkthrough of the grpc-gateway approach, including code generation and OpenAPI documentation, see our blog post on gRPC API Gateway: Bridging the Gap Between REST and gRPC.

REST-to-gRPC Bridging

Transcoding works in both directions. When your backend services use gRPC but your external API contract is REST, the gateway accepts REST requests and forwards them as gRPC calls. This is particularly useful when:

  • Migrating from REST to gRPC incrementally — You can move individual services to gRPC without changing your public API contract. Consumers continue calling REST endpoints while the gateway handles the translation.
  • Supporting legacy integrations — Partners and internal systems that cannot adopt gRPC continue using the REST interface they already depend on.
  • Providing a unified API surface — A single OpenAPI specification describes your REST endpoints, while the gateway routes traffic to the appropriate gRPC backends.

The gateway handles the full request lifecycle: parsing JSON request bodies into Protocol Buffer messages, mapping HTTP path and query parameters to gRPC fields, invoking the gRPC service, and serializing the protobuf response back to JSON.

gRPC Load Balancing

Load balancing gRPC traffic is fundamentally different from load balancing REST APIs, and getting it wrong is one of the most common operational issues teams face when adopting gRPC.

The HTTP/2 Multiplexing Problem

REST APIs typically use HTTP/1.1, where each request opens a new TCP connection (or reuses one from a pool with clear request boundaries). Load balancers distribute these connections across backend instances, and because each connection carries roughly one request, the load spreads evenly.

gRPC uses HTTP/2, which multiplexes many concurrent requests (streams) over a single TCP connection. A gRPC client opens one connection to a backend and sends all its requests over that connection. If you are using an L4 load balancer, it sees one connection and routes all traffic to one backend — even if you have ten other backends sitting idle.

This creates the “sticky connection” problem that is especially visible in Kubernetes, where the default kube-proxy load balancer operates at Layer 4.

L4 vs L7 Load Balancing for gRPC

  • Layer 4 (transport) — Distributes based on TCP connections. Fast and lightweight, but blind to individual gRPC requests within a connection. For gRPC, L4 load balancing results in uneven request distribution because multiple RPCs share one connection.
  • Layer 7 (application) — Understands HTTP/2 and can distribute individual gRPC requests across backends. The load balancer terminates the client’s HTTP/2 connection and opens separate connections to each backend, routing each RPC independently.

For gRPC workloads, Layer 7 load balancing is almost always the right choice. The additional processing overhead is minimal compared to the cost of sending all traffic to a single backend.

Approaches to gRPC Load Balancing

There are three main strategies, each suited to different architectures:

Proxy-based (server-side) load balancing places a Layer 7 proxy between clients and backends. The proxy terminates the client’s HTTP/2 connection and distributes individual RPCs. Envoy, Linkerd, and NGINX all support this pattern. This is the simplest approach and works well for most deployments.

Client-side load balancing uses gRPC’s built-in name resolution and load balancing APIs. The client discovers backend instances (typically via DNS or a service registry) and distributes RPCs directly, without a proxy in the path. This eliminates the proxy hop but adds complexity to client configuration.

Service mesh deploys a sidecar proxy alongside each service instance. The sidecar handles L7 load balancing, retries, circuit breaking, and mTLS transparently. Istio (with Envoy sidecars) and Linkerd are the most common service mesh options for gRPC workloads.

For more on gateway-level traffic management patterns, see API Gateway Traffic Management: Routing and Load Balancing.

gRPC Health Checking and Service Discovery

gRPC defines its own health checking protocol, separate from HTTP health checks. An API gateway or load balancer that manages gRPC backends needs to speak this protocol to determine which instances are healthy.

The gRPC Health Checking Protocol

The protocol defines a grpc.health.v1.Health service with a Check RPC. The ServingStatus enum includes four values: UNKNOWN, SERVING, NOT_SERVING, and SERVICE_UNKNOWN (used by the Watch RPC). In practice, the Check RPC returns one of two statuses for known services:

  • SERVING — The service is healthy and accepting traffic.
  • NOT_SERVING — The service is unhealthy and should be removed from the load balancing pool.

If the requested service name is not registered, the server returns a NOT_FOUND gRPC status.

Unlike HTTP health checks (where you hit a /healthz endpoint and check for a 200 status code), gRPC health checks require a client that can make gRPC calls. This means your gateway or load balancer must support the gRPC health checking protocol natively.

Kubernetes Integration

Kubernetes has supported native gRPC health probes since version 1.24 (beta, enabled by default), with the feature graduating to GA in version 1.27. You can configure liveness, readiness, and startup probes that call the gRPC health service directly:

YAMLyaml
livenessProbe:
  grpc:
    port: 50051
    service: "my-service"
  initialDelaySeconds: 10
  periodSeconds: 10

For older Kubernetes versions, the grpc-health-probe utility provides the same functionality as a sidecar or init container. The Kubernetes Gateway API also added GRPCRoute as a stable resource in v1.1, giving gRPC services first-class routing support alongside HTTP routes.

Authentication and Authorization for gRPC

Authentication at the gateway is one of the strongest reasons to use a gRPC API gateway. Without it, every service must implement its own credential validation — leading to inconsistent security, duplicated logic, and a larger attack surface.

Common Authentication Patterns

API key authentication — The gateway validates an API key (typically in a header or metadata field) before forwarding the request to the backend. This is the simplest approach for external APIs and works identically for REST and gRPC traffic when the gateway handles both.

JWT validation — The gateway verifies a JSON Web Token’s signature, expiry, and claims at the edge. Valid tokens are forwarded (often with claims extracted into gRPC metadata), and expired or invalid tokens are rejected before they reach your services.

mTLS (mutual TLS) — Both the client and server present certificates for mutual authentication. This is common for service-to-service communication where both parties are within the same trust domain. The gateway can terminate mTLS at the edge and use internal mTLS or plaintext to communicate with backends.

OAuth 2.0 token introspection — The gateway validates bearer tokens by introspecting them against an authorization server. This is common in enterprise environments where a centralized identity provider manages access tokens.

gRPC Metadata for Identity Propagation

Once the gateway authenticates a request, it needs to pass the authenticated identity to the backend service. In gRPC, this is done through metadata — key-value pairs attached to each RPC call, analogous to HTTP headers. The gateway extracts identity information (user ID, scopes, roles) from the validated credential and injects it as gRPC metadata that backends can trust without re-validating.

gRPC Observability Through the Gateway

The gateway is the natural instrumentation point for gRPC traffic. Because every request passes through it, you get comprehensive visibility without modifying your application code.

Distributed Tracing

gRPC supports distributed tracing through OpenTelemetry, which provides standardized trace propagation across services. The gateway can:

  • Inject trace context into incoming requests that lack it
  • Propagate existing trace headers to backend gRPC calls
  • Record spans for gateway-level processing (authentication, rate limiting, transcoding)

This gives you end-to-end visibility into request latency, from the client through the gateway to the backend service and back.

Metrics

Key gRPC metrics to capture at the gateway include:

  • Request rate — RPCs per second, broken down by service and method
  • Error rate — Percentage of RPCs returning non-OK gRPC status codes
  • Latency distribution — P50, P95, and P99 latency for each method
  • Active streams — Number of concurrent gRPC streams (important for streaming RPCs)

These metrics map to the RED method (Rate, Errors, Duration) that is standard for microservices monitoring.

Logging

The gateway can log gRPC request metadata (method, service, status code, latency) without inspecting the binary protobuf payload. For debugging, some gateways support logging the decoded JSON representation of protobuf messages, though this adds overhead and should be used selectively.

gRPC-Web and Browser Support

Browsers cannot make native gRPC calls because the Fetch API does not expose HTTP/2 framing at the level gRPC requires. The gRPC-Web protocol solves this by defining a compatibility layer that works over HTTP/1.1 and HTTP/2 without requiring low-level frame access.

How gRPC-Web Works

gRPC-Web modifies the standard gRPC protocol in several ways:

  • It supports both HTTP/1.1 and HTTP/2 transports
  • gRPC trailers are sent in the response body instead of HTTP/2 trailing headers (which browsers cannot access)
  • A proxy translates between the gRPC-Web wire format and standard gRPC

The client-side library (grpc-web on npm) handles serialization and deserialization of Protocol Buffer messages and manages the gRPC-Web framing. Server-side, a proxy (most commonly Envoy) receives gRPC-Web requests and forwards them as standard gRPC to your backend services.

Browser Streaming Limitations

gRPC-Web supports server-side streaming (the server sends a stream of messages to the client), but client-side streaming and bidirectional streaming are not supported in browsers. This is a fundamental limitation of browser APIs — the Fetch API specification includes streaming request bodies, but browser vendors have not implemented this feature.

If your application requires bidirectional streaming from the browser, consider using WebSockets alongside gRPC-Web, or evaluate ConnectRPC — a newer protocol family that offers enhanced browser support with gRPC-compatible backends.

gRPC-Web Proxies

Several proxies support the gRPC-Web protocol:

  • Envoy — The official default proxy for gRPC-Web, with built-in support that requires minimal configuration. New gRPC-Web features are implemented in Envoy first.
  • grpc-web Go proxy — A lightweight alternative for Go applications.
  • Apache APISIX — Includes a gRPC-Web plugin for environments already using APISIX.
  • NGINX — Supports native gRPC proxying but does not have built-in gRPC-Web transcoding. A separate proxy like Envoy is needed for gRPC-Web support.

Performance Considerations

gRPC is fast by default, but the gateway layer introduces trade-offs that you should understand.

Protobuf Serialization Overhead

When the gateway performs transcoding (converting between JSON and protobuf), it adds serialization and deserialization overhead. For high-throughput APIs, this can be significant:

  • JSON → Protobuf — The gateway parses JSON, validates field types, and encodes the protobuf binary format. This is more expensive than passthrough proxying.
  • Protobuf → JSON — The gateway decodes the binary protobuf response and produces JSON. This is typically faster than the reverse because protobuf decoding is simpler than JSON parsing.

If your clients can speak gRPC natively, passthrough proxying (no transcoding) eliminates this overhead entirely.

HTTP/2 Connection Management

gRPC’s use of HTTP/2 creates specific performance considerations at the gateway:

  • Connection pooling — The gateway should maintain multiple HTTP/2 connections to each backend to avoid bottlenecking on a single connection’s flow control window.
  • Max concurrent streams — HTTP/2 limits the number of concurrent streams per connection (typically 100–250). The gateway should open additional connections when this limit is reached.
  • Keep-alive and idle timeouts — Long-lived gRPC connections can be silently dropped by intermediate network devices. Configure HTTP/2 keep-alive pings at the gateway to detect and recover from broken connections.

Streaming Performance

Streaming RPCs (server streaming, client streaming, and bidirectional streaming) hold connections open for extended periods. The gateway must:

  • Avoid buffering entire streams in memory — process messages incrementally
  • Support backpressure so a slow consumer does not overwhelm the gateway
  • Handle stream cancellation gracefully when clients disconnect

Choosing a gRPC API Gateway

The right gateway depends on your architecture, team expertise, and operational requirements. Here is how the major options compare for gRPC workloads.

Envoy Proxy

Envoy is the de facto standard for gRPC proxying. Originally built at Lyft, it provides first-class support for gRPC including JSON transcoding, gRPC-Web bridging, health checking, and L7 load balancing. Envoy is the data plane for service meshes like Istio and powers projects like Envoy Gateway.

Strengths: Best-in-class gRPC support, high performance, mature ecosystem.

Trade-offs: Requires Kubernetes and a control plane (Istio or Envoy Gateway) for production use. Protobuf-based configuration has a steep learning curve. No built-in developer portal, API key management, or API monetization.

Zuplo

Zuplo is an edge-native API gateway that supports HTTP/2 and can proxy gRPC traffic across its network of over 300 global data centers. Zuplo’s strength for gRPC architectures is in managing the REST-facing side of a gRPC system — handling authentication, rate limiting, and developer experience for the external API that fronts your gRPC backends.

Strengths: Fully managed with zero infrastructure to operate. TypeScript programmability for custom request handling. Built-in API key management, rate limiting, and automatic developer portal generation. Deploys to 300+ edge locations with sub-50ms latency globally.

Trade-offs: Does not provide built-in gRPC-JSON transcoding or gRPC-Web bridging at the gateway layer. If your primary protocol is gRPC end-to-end, a proxy like Envoy is the stronger choice for the gRPC data plane.

Kong Gateway

Kong supports gRPC proxying and gRPC-Web through dedicated plugins. It can terminate gRPC traffic and apply rate limiting, authentication, and logging policies.

Strengths: Large plugin ecosystem, supports multiple protocols alongside gRPC, available as both open-source and enterprise.

Trade-offs: gRPC support requires separate plugin configuration. Lua-based plugins are less familiar to most development teams. Self-hosted deployments need PostgreSQL (Cassandra support was removed in Kong 3.4).

Traefik

Traefik supports gRPC proxying over HTTP/2 and integrates with Kubernetes Ingress and the Gateway API (including GRPCRoute). It provides automatic service discovery and certificate management.

Strengths: Easy Kubernetes integration, automatic Let’s Encrypt certificates, built-in support for GRPCRoute.

Trade-offs: Limited gRPC-specific features compared to Envoy. No built-in transcoding — you need a separate grpc-gateway or Envoy sidecar for gRPC-to-REST translation.

NGINX

NGINX supports gRPC proxying via its ngx_http_grpc_module and is widely deployed and well-understood by operations teams.

Strengths: Ubiquitous, high performance, extensive documentation.

Trade-offs: No built-in gRPC-JSON transcoding. Configuration is file-based and less developer-friendly. No developer portal or API key management.

The Hybrid Approach

Many production architectures combine multiple gateways. A common pattern:

  • Envoy handles gRPC-specific concerns — transcoding, gRPC-Web, L7 load balancing, and health checking for internal service-to-service traffic.
  • Zuplo sits at the edge, managing the external REST API that consumers interact with — handling authentication, rate limiting, developer portal, and API key lifecycle.

This gives you best-in-class gRPC support internally and a developer-friendly, fully managed API experience externally. For a deeper comparison of Envoy and Zuplo, see Zuplo vs Envoy Proxy.

Best Practices

Start with Passthrough Proxying

If your clients can speak gRPC natively, proxy gRPC traffic through the gateway without transcoding. This preserves gRPC’s performance benefits and avoids the serialization overhead of JSON conversion. Add transcoding only for clients that genuinely need REST/JSON.

Use L7 Load Balancing from Day One

Do not wait for uneven load distribution to become a production incident. Configure Layer 7 load balancing for gRPC traffic as part of your initial deployment. In Kubernetes, this means using an L7-aware ingress controller or service mesh rather than relying on kube-proxy.

Centralize Authentication at the Gateway

Let the gateway handle credential validation (API keys, JWTs, mTLS) and pass authenticated identity to backends via gRPC metadata. This keeps your services focused on business logic and ensures consistent security across your entire API surface.

Implement Health Checks with the gRPC Protocol

Use the standard gRPC health checking protocol for your services, not HTTP health endpoints. This ensures your gateway and load balancer can accurately assess service health using the same protocol as your production traffic.

Monitor the Four Golden Signals

Track request rate, error rate, latency, and saturation at the gateway for every gRPC service and method. These metrics are your first line of defense against performance regressions and capacity issues.

Plan Your Transcoding Strategy

If you need both gRPC and REST interfaces, decide early whether to use gateway-level transcoding (Envoy, grpc-gateway) or maintain separate REST and gRPC service implementations. Gateway-level transcoding reduces code duplication but adds a dependency on annotation correctness.

Further Reading

Ready to manage your API gateway for gRPC and REST workloads? Sign up for Zuplo to deploy a fully managed API gateway in minutes, or explore these related resources: