gRPC API Gateway: Protocol Translation & Load Balancing

gRPC has become the default choice for high-performance microservices communication. Its binary serialization with Protocol Buffers, native HTTP/2 transport, and built-in streaming make it significantly faster than REST for internal service-to-service calls. But the moment you need to expose gRPC services to external consumers, browsers, or mobile clients, you run into a wall of compatibility challenges.

That is where an API gateway comes in. A gRPC API gateway sits between your clients and your gRPC backends, handling protocol translation, load balancing, authentication, and observability — so your services can speak gRPC internally while remaining accessible to the outside world.

Why gRPC Needs a Gateway

gRPC works beautifully inside a controlled environment where every client and server speaks the same protocol. The problems start when you step outside that boundary:

Browsers cannot call gRPC directly — The browser’s Fetch API does not expose the low-level HTTP/2 framing that gRPC requires. There is no way to force HTTP/2 or access raw HTTP/2 frames from JavaScript. This means web applications need a translation layer to interact with gRPC services.
External consumers expect REST/JSON — Most third-party developers, partners, and legacy systems are built around REST conventions. Asking them to adopt gRPC, Protocol Buffers, and code generation creates unnecessary friction for API adoption.
Load balancing breaks with HTTP/2 — gRPC multiplexes many requests over a single long-lived TCP connection. Traditional Layer 4 (L4) load balancers distribute connections, not individual requests, which means all traffic from one client can land on a single backend pod while others sit idle.
Cross-cutting concerns need a centralized layer — Authentication, rate limiting, logging, and metrics should not be duplicated in every gRPC service. A gateway enforces these policies consistently across your entire API surface.

For a deeper comparison of when to use each protocol, see REST or gRPC? A Guide to Efficient API Design.

Protocol Translation

The most common gateway pattern for gRPC is transcoding: automatically translating between HTTP/JSON requests and gRPC calls. This lets you maintain a single gRPC service definition while serving both gRPC and REST clients. Transcoding works in both directions — the gateway can convert REST requests into gRPC calls, or expose gRPC responses as JSON to REST consumers.

How Transcoding Works

Transcoding relies on annotations in your .proto files that map gRPC methods to HTTP endpoints. The standard approach uses google.api.http annotations defined in Google’s AIP-127 specification:

protobuf

syntax = "proto3";

import "google/api/annotations.proto";

service UserService {
  rpc GetUser(GetUserRequest) returns (User) {
    option (google.api.http) = {
      get: "/v1/users/{user_id}"
    };
  }

  rpc CreateUser(CreateUserRequest) returns (User) {
    option (google.api.http) = {
      post: "/v1/users"
      body: "*"
    };
  }
}

When a client sends GET /v1/users/123, the gateway translates this into a GetUser gRPC call with user_id set to 123. The gRPC response is serialized back to JSON and returned to the client. The client never needs to know that gRPC is involved.

The gateway handles the full request lifecycle in both directions: parsing JSON request bodies into Protocol Buffer messages, mapping HTTP path and query parameters to gRPC fields, invoking the gRPC service, and serializing the protobuf response back to JSON. This is particularly useful when migrating from REST to gRPC incrementally — you can move individual services to gRPC without changing your public API contract, and consumers continue calling REST endpoints while the gateway handles the translation.

Transcoding Implementations

Several tools handle gRPC-JSON transcoding, each with different trade-offs:

Envoy’s gRPC-JSON Transcoder filter — Envoy reads your protobuf descriptor set and performs transcoding inline as a filter. This is the most common approach in Kubernetes environments where Envoy is already the data plane proxy.
grpc-gateway — An open-source Go project that generates a reverse-proxy server from your annotated .proto files. It produces a standalone Go HTTP server that translates REST calls to gRPC. This is a popular choice in Go ecosystems.
ASP.NET Core gRPC JSON transcoding — Microsoft’s built-in transcoding for .NET applications. It runs inside the same ASP.NET Core process as your gRPC service, avoiding the overhead of a separate proxy.
Google Cloud Endpoints — Google’s managed service that provides transcoding for gRPC services deployed on GCP, using the same google.api.http annotations.

For a hands-on walkthrough of the grpc-gateway approach, including code generation and OpenAPI documentation, see our blog post on gRPC API Gateway: Bridging the Gap Between REST and gRPC.

gRPC Load Balancing

Load balancing gRPC traffic is fundamentally different from load balancing REST APIs, and getting it wrong is one of the most common operational issues teams face when adopting gRPC.

The HTTP/2 Multiplexing Problem

REST APIs typically use HTTP/1.1, where each request opens a new TCP connection (or reuses one from a pool with clear request boundaries). Load balancers distribute these connections across backend instances, and because each connection carries roughly one request, the load spreads evenly.

gRPC uses HTTP/2, which multiplexes many concurrent requests (streams) over a single TCP connection. A gRPC client opens one connection to a backend and sends all its requests over that connection. If you are using an L4 load balancer, it sees one connection and routes all traffic to one backend — even if you have ten other backends sitting idle.

This creates the “sticky connection” problem that is especially visible in Kubernetes, where the default kube-proxy load balancer operates at Layer 4.

The fix is Layer 7 (application-level) load balancing. Unlike Layer 4 load balancers that distribute based on TCP connections, L7 load balancers understand HTTP/2 and can distribute individual gRPC requests across backends. The load balancer terminates the client’s HTTP/2 connection and opens separate connections to each backend, routing each RPC independently. For gRPC workloads, Layer 7 load balancing is almost always the right choice — the additional processing overhead is minimal compared to the cost of sending all traffic to a single backend.

Approaches to gRPC Load Balancing

There are three main strategies, each suited to different architectures:

Proxy-based (server-side) load balancing places a Layer 7 proxy between clients and backends. The proxy terminates the client’s HTTP/2 connection and distributes individual RPCs. Envoy, Linkerd, and NGINX all support this pattern. This is the simplest approach and works well for most deployments.

Client-side load balancing uses gRPC’s built-in name resolution and load balancing APIs. The client discovers backend instances (typically via DNS or a service registry) and distributes RPCs directly, without a proxy in the path. This eliminates the proxy hop but adds complexity to client configuration.

Service mesh deploys a sidecar proxy alongside each service instance. The sidecar handles L7 load balancing, retries, circuit breaking, and mTLS transparently. Istio (with Envoy sidecars) and Linkerd are the most common service mesh options for gRPC workloads.

For more on gateway-level traffic management patterns, see API Gateway Traffic Management: Routing and Load Balancing.

Health Checking and Service Discovery

gRPC defines its own health checking protocol, separate from HTTP health checks. An API gateway or load balancer that manages gRPC backends needs to speak this protocol to determine which instances are healthy.

The protocol defines a grpc.health.v1.Health service with a Check RPC. The ServingStatus enum includes four values: UNKNOWN, SERVING, NOT_SERVING, and SERVICE_UNKNOWN (used by the Watch RPC). In practice, the Check RPC returns one of two statuses for known services: SERVING (the service is healthy and accepting traffic) or NOT_SERVING (the service is unhealthy and should be removed from the load balancing pool). If the requested service name is not registered, the server returns a NOT_FOUND gRPC status.

Unlike HTTP health checks (where you hit a /healthz endpoint and check for a 200 status code), gRPC health checks require a client that can make gRPC calls. This means your gateway or load balancer must support the gRPC health checking protocol natively.

Kubernetes has supported native gRPC health probes since version 1.24 (beta, enabled by default), with the feature graduating to GA in version 1.27. You can configure liveness, readiness, and startup probes that call the gRPC health service directly:

yaml

livenessProbe:
  grpc:
    port: 50051
    service: "my-service"
  initialDelaySeconds: 10
  periodSeconds: 10

For older Kubernetes versions, the grpc-health-probe utility provides the same functionality as a sidecar or init container. The Kubernetes Gateway API also added GRPCRoute as a stable resource in v1.1, giving gRPC services first-class routing support alongside HTTP routes.

Authentication and Authorization for gRPC

Authentication at the gateway is one of the strongest reasons to use a gRPC API gateway. Without it, every service must implement its own credential validation — leading to inconsistent security, duplicated logic, and a larger attack surface.

API key authentication validates an API key (typically in a header or metadata field) before forwarding the request to the backend. This is the simplest approach for external APIs and works identically for REST and gRPC traffic when the gateway handles both.

JWT validation verifies a JSON Web Token’s signature, expiry, and claims at the edge. Valid tokens are forwarded (often with claims extracted into gRPC metadata), and expired or invalid tokens are rejected before they reach your services.

mTLS (mutual TLS) requires both the client and server to present certificates for mutual authentication. This is common for service-to-service communication where both parties are within the same trust domain. The gateway can terminate mTLS at the edge and use internal mTLS or plaintext to communicate with backends.

OAuth 2.0 token introspection validates bearer tokens by introspecting them against an authorization server. This is common in enterprise environments where a centralized identity provider manages access tokens.

Once the gateway authenticates a request, it needs to pass the authenticated identity to the backend service. In gRPC, this is done through metadata — key-value pairs attached to each RPC call, analogous to HTTP headers. The gateway extracts identity information (user ID, scopes, roles) from the validated credential and injects it as gRPC metadata that backends can trust without re-validating.

Observability Through the Gateway

The gateway is the natural instrumentation point for gRPC traffic. Because every request passes through it, you get comprehensive visibility without modifying your application code.

gRPC supports distributed tracing through OpenTelemetry, which provides standardized trace propagation across services. The gateway can inject trace context into incoming requests that lack it, propagate existing trace headers to backend gRPC calls, and record spans for gateway-level processing (authentication, rate limiting, transcoding). This gives you end-to-end visibility into request latency, from the client through the gateway to the backend service and back.

Key gRPC metrics to capture at the gateway include request rate (RPCs per second, broken down by service and method), error rate (percentage of RPCs returning non-OK gRPC status codes), latency distribution (P50, P95, and P99 for each method), and active streams (number of concurrent gRPC streams, important for streaming RPCs). These metrics map to the RED method (Rate, Errors, Duration) that is standard for microservices monitoring.

For logging, the gateway can log gRPC request metadata (method, service, status code, latency) without inspecting the binary protobuf payload. For debugging, some gateways support logging the decoded JSON representation of protobuf messages, though this adds overhead and should be used selectively.

gRPC-Web and Browser Support

Browsers cannot make native gRPC calls because the Fetch API does not expose HTTP/2 framing at the level gRPC requires. The gRPC-Web protocol solves this by defining a compatibility layer that works over HTTP/1.1 and HTTP/2 without requiring low-level frame access.

gRPC-Web modifies the standard gRPC protocol in several ways: it supports both HTTP/1.1 and HTTP/2 transports, gRPC trailers are sent in the response body instead of HTTP/2 trailing headers (which browsers cannot access), and a proxy translates between the gRPC-Web wire format and standard gRPC. The client-side library (grpc-web on npm) handles serialization and deserialization of Protocol Buffer messages and manages the gRPC-Web framing. Server-side, a proxy (most commonly Envoy) receives gRPC-Web requests and forwards them as standard gRPC to your backend services.

gRPC-Web supports server-side streaming (the server sends a stream of messages to the client), but client-side streaming and bidirectional streaming are not supported in browsers. This is a fundamental limitation of browser APIs — the Fetch API specification includes streaming request bodies, but browser vendors have not implemented this feature. If your application requires bidirectional streaming from the browser, consider using WebSockets alongside gRPC-Web, or evaluate ConnectRPC — a newer protocol family that offers enhanced browser support with gRPC-compatible backends.

Several proxies support gRPC-Web: Envoy is the official default proxy with built-in support that requires minimal configuration; grpc-web Go proxy is a lightweight alternative for Go applications; Apache APISIX includes a gRPC-Web plugin for environments already using APISIX; and NGINX supports native gRPC proxying but does not have built-in gRPC-Web transcoding, so a separate proxy like Envoy is needed for gRPC-Web support.

Performance Considerations

gRPC is fast by default, but the gateway layer introduces trade-offs that you should understand.

Protobuf serialization overhead is the primary cost of transcoding. When the gateway converts between JSON and protobuf, it adds serialization and deserialization overhead. JSON-to-protobuf (parsing JSON, validating field types, encoding the binary format) is more expensive than the reverse, because protobuf decoding is simpler than JSON parsing. If your clients can speak gRPC natively, passthrough proxying (no transcoding) eliminates this overhead entirely.

HTTP/2 connection management creates specific performance considerations at the gateway. The gateway should maintain multiple HTTP/2 connections to each backend to avoid bottlenecking on a single connection’s flow control window. HTTP/2 limits the number of concurrent streams per connection (typically 100–250), so the gateway should open additional connections when this limit is reached. Long-lived gRPC connections can be silently dropped by intermediate network devices, so configure HTTP/2 keep-alive pings at the gateway to detect and recover from broken connections.

Streaming RPCs (server streaming, client streaming, and bidirectional streaming) hold connections open for extended periods. The gateway must avoid buffering entire streams in memory (process messages incrementally), support backpressure so a slow consumer does not overwhelm the gateway, and handle stream cancellation gracefully when clients disconnect.

Choosing a gRPC API Gateway

The right gateway depends on your architecture, team expertise, and operational requirements. Here is how the major options compare for gRPC workloads.

Envoy Proxy is the de facto standard for gRPC proxying. Originally built at Lyft, it provides first-class support for gRPC including JSON transcoding, gRPC-Web bridging, health checking, and L7 load balancing. Envoy is the data plane for service meshes like Istio and powers projects like Envoy Gateway. Its strengths are best-in-class gRPC support, high performance, and a mature ecosystem. The trade-offs: it requires Kubernetes and a control plane (Istio or Envoy Gateway) for production use, its protobuf-based configuration has a steep learning curve, and there is no built-in developer portal, API key management, or API monetization.

Zuplo is an edge-native API gateway that supports HTTP/2 and can proxy gRPC traffic across its network of over 300 global data centers. Zuplo’s strength for gRPC architectures is in managing the REST-facing side of a gRPC system — handling authentication, rate limiting, and developer experience for the external API that fronts your gRPC backends. It is fully managed with zero infrastructure to operate, offers TypeScript programmability for custom request handling, and includes built-in API key management, rate limiting, and automatic developer portal generation. It deploys to 300+ edge locations, serving requests within 50ms of most users. The trade-off is that it does not provide built-in gRPC-JSON transcoding or gRPC-Web bridging at the gateway layer — if your primary protocol is gRPC end-to-end, a proxy like Envoy is the stronger choice for the gRPC data plane.

Kong Gateway supports gRPC proxying and gRPC-Web through dedicated plugins. It has a large plugin ecosystem, supports multiple protocols alongside gRPC, and is available as both open-source and enterprise. The trade-offs: gRPC support requires separate plugin configuration, Lua-based plugins are less familiar to most development teams, and self-hosted deployments need PostgreSQL (Cassandra support was removed in Kong 3.4).

Traefik supports gRPC proxying over HTTP/2 and integrates with Kubernetes Ingress and the Gateway API (including GRPCRoute). It offers easy Kubernetes integration, automatic Let’s Encrypt certificates, and built-in support for GRPCRoute. However, it has limited gRPC-specific features compared to Envoy and no built-in transcoding.

NGINX supports gRPC proxying via its ngx_http_grpc_module and is widely deployed and well-understood. It offers high performance and extensive documentation, but has no built-in gRPC-JSON transcoding, file-based configuration that is less developer-friendly, and no developer portal or API key management.

The Hybrid Approach

Many production architectures combine multiple gateways. A common pattern:

Envoy handles gRPC-specific concerns — transcoding, gRPC-Web, L7 load balancing, and health checking for internal service-to-service traffic.
Zuplo sits at the edge, managing the external REST API that consumers interact with — handling authentication, rate limiting, developer portal, and API key lifecycle.

This gives you best-in-class gRPC support internally and a developer-friendly, fully managed API experience externally. For a deeper comparison of Envoy and Zuplo, see Zuplo vs Envoy Proxy.

Best Practices

Start with passthrough proxying. If your clients can speak gRPC natively, proxy gRPC traffic through the gateway without transcoding. This preserves gRPC’s performance benefits and avoids the serialization overhead of JSON conversion. Add transcoding only for clients that genuinely need REST/JSON.
Use L7 load balancing from day one. Do not wait for uneven load distribution to become a production incident. Configure Layer 7 load balancing for gRPC traffic as part of your initial deployment. In Kubernetes, this means using an L7-aware ingress controller or service mesh rather than relying on kube-proxy.
Centralize authentication at the gateway. Let the gateway handle credential validation (API keys, JWTs, mTLS) and pass authenticated identity to backends via gRPC metadata. This keeps your services focused on business logic and ensures consistent security across your entire API surface.
Implement health checks with the gRPC protocol. Use the standard gRPC health checking protocol for your services, not HTTP health endpoints. This ensures your gateway and load balancer can accurately assess service health using the same protocol as your production traffic.
Monitor the four golden signals. Track request rate, error rate, latency, and saturation at the gateway for every gRPC service and method. These metrics are your first line of defense against performance regressions and capacity issues.
Plan your transcoding strategy. If you need both gRPC and REST interfaces, decide early whether to use gateway-level transcoding (Envoy, grpc-gateway) or maintain separate REST and gRPC service implementations. Gateway-level transcoding reduces code duplication but adds a dependency on annotation correctness.