---
title: "gRPC API Gateway: Protocol Translation, Load Balancing, and Observability"
description: "Learn how API gateways handle gRPC traffic — from protocol translation and load balancing to authentication, observability, and gRPC-Web support."
canonicalUrl: "https://zuplo.com/learning-center/grpc-api-gateway-guide"
pageType: "learning-center"
authors: "nate"
tags: "gRPC, API Gateway"
image: "https://zuplo.com/og?text=gRPC%20API%20Gateway%3A%20Protocol%20Translation%2C%20Load%20Balancing%2C%20and%20Observability"
---
gRPC has become the default choice for high-performance microservices
communication. Its binary serialization with Protocol Buffers, native HTTP/2
transport, and built-in streaming make it significantly faster than REST for
internal service-to-service calls. But the moment you need to expose gRPC
services to external consumers, browsers, or mobile clients, you run into a wall
of compatibility challenges.

That is where an API gateway comes in. A gRPC API gateway sits between your
clients and your gRPC backends, handling protocol translation, load balancing,
authentication, and observability — so your services can speak gRPC internally
while remaining accessible to the outside world.

## Contents

- [Why gRPC Needs a Gateway](#why-grpc-needs-a-gateway)
- [gRPC-to-REST Transcoding](#grpc-to-rest-transcoding)
- [REST-to-gRPC Bridging](#rest-to-grpc-bridging)
- [gRPC Load Balancing](#grpc-load-balancing)
- [gRPC Health Checking and Service Discovery](#grpc-health-checking-and-service-discovery)
- [Authentication and Authorization for gRPC](#authentication-and-authorization-for-grpc)
- [gRPC Observability Through the Gateway](#grpc-observability-through-the-gateway)
- [gRPC-Web and Browser Support](#grpc-web-and-browser-support)
- [Performance Considerations](#performance-considerations)
- [Choosing a gRPC API Gateway](#choosing-a-grpc-api-gateway)
- [Best Practices](#best-practices)

## Why gRPC Needs a Gateway

gRPC works beautifully inside a controlled environment where every client and
server speaks the same protocol. The problems start when you step outside that
boundary:

- **Browsers cannot call gRPC directly** — The browser's Fetch API does not
  expose the low-level HTTP/2 framing that gRPC requires. There is no way to
  force HTTP/2 or access raw HTTP/2 frames from JavaScript. This means web
  applications need a translation layer to interact with gRPC services.
- **External consumers expect REST/JSON** — Most third-party developers,
  partners, and legacy systems are built around REST conventions. Asking them to
  adopt gRPC, Protocol Buffers, and code generation creates unnecessary friction
  for API adoption.
- **Load balancing breaks with HTTP/2** — gRPC multiplexes many requests over a
  single long-lived TCP connection. Traditional Layer 4 (L4) load balancers
  distribute connections, not individual requests, which means all traffic from
  one client can land on a single backend pod while others sit idle.
- **Cross-cutting concerns need a centralized layer** — Authentication, rate
  limiting, logging, and metrics should not be duplicated in every gRPC service.
  A gateway enforces these policies consistently across your entire API surface.

For a deeper comparison of when to use each protocol, see
[REST or gRPC? A Guide to Efficient API Design](/learning-center/rest-or-grpc-guide).

## gRPC-to-REST Transcoding

The most common gateway pattern for gRPC is transcoding: automatically
translating between HTTP/JSON requests and gRPC calls. This lets you maintain a
single gRPC service definition while serving both gRPC and REST clients.

### How Transcoding Works

Transcoding relies on annotations in your `.proto` files that map gRPC methods
to HTTP endpoints. The standard approach uses `google.api.http` annotations
defined in Google's [AIP-127 specification](https://google.aip.dev/127):

```protobuf
syntax = "proto3";

import "google/api/annotations.proto";

service UserService {
  rpc GetUser(GetUserRequest) returns (User) {
    option (google.api.http) = {
      get: "/v1/users/{user_id}"
    };
  }

  rpc CreateUser(CreateUserRequest) returns (User) {
    option (google.api.http) = {
      post: "/v1/users"
      body: "*"
    };
  }
}
```

When a client sends `GET /v1/users/123`, the gateway translates this into a
`GetUser` gRPC call with `user_id` set to `123`. The gRPC response is serialized
back to JSON and returned to the client. The client never needs to know that
gRPC is involved.

### Transcoding Implementations

Several tools handle gRPC-JSON transcoding, each with different trade-offs:

- **Envoy's gRPC-JSON Transcoder filter** — Envoy reads your protobuf descriptor
  set and performs transcoding inline as a filter. This is the most common
  approach in Kubernetes environments where Envoy is already the data plane
  proxy.
- **grpc-gateway** — An open-source Go project that generates a reverse-proxy
  server from your annotated `.proto` files. It produces a standalone Go HTTP
  server that translates REST calls to gRPC. This is a popular choice in Go
  ecosystems.
- **ASP.NET Core gRPC JSON transcoding** — Microsoft's built-in transcoding for
  .NET applications. It runs inside the same ASP.NET Core process as your gRPC
  service, avoiding the overhead of a separate proxy.
- **Google Cloud Endpoints** — Google's managed service that provides
  transcoding for gRPC services deployed on GCP, using the same
  `google.api.http` annotations.

For a hands-on walkthrough of the grpc-gateway approach, including code
generation and OpenAPI documentation, see our blog post on
[gRPC API Gateway: Bridging the Gap Between REST and gRPC](/blog/grpc-api-gateway).

## REST-to-gRPC Bridging

Transcoding works in both directions. When your backend services use gRPC but
your external API contract is REST, the gateway accepts REST requests and
forwards them as gRPC calls. This is particularly useful when:

- **Migrating from REST to gRPC incrementally** — You can move individual
  services to gRPC without changing your public API contract. Consumers continue
  calling REST endpoints while the gateway handles the translation.
- **Supporting legacy integrations** — Partners and internal systems that cannot
  adopt gRPC continue using the REST interface they already depend on.
- **Providing a unified API surface** — A single OpenAPI specification describes
  your REST endpoints, while the gateway routes traffic to the appropriate gRPC
  backends.

The gateway handles the full request lifecycle: parsing JSON request bodies into
Protocol Buffer messages, mapping HTTP path and query parameters to gRPC fields,
invoking the gRPC service, and serializing the protobuf response back to JSON.

## gRPC Load Balancing

Load balancing gRPC traffic is fundamentally different from load balancing REST
APIs, and getting it wrong is one of the most common operational issues teams
face when adopting gRPC.

### The HTTP/2 Multiplexing Problem

REST APIs typically use HTTP/1.1, where each request opens a new TCP connection
(or reuses one from a pool with clear request boundaries). Load balancers
distribute these connections across backend instances, and because each
connection carries roughly one request, the load spreads evenly.

gRPC uses HTTP/2, which multiplexes many concurrent requests (streams) over a
single TCP connection. A gRPC client opens one connection to a backend and sends
all its requests over that connection. If you are using an L4 load balancer, it
sees one connection and routes all traffic to one backend — even if you have ten
other backends sitting idle.

This creates the "sticky connection" problem that is especially visible in
Kubernetes, where the default `kube-proxy` load balancer operates at Layer 4.

### L4 vs L7 Load Balancing for gRPC

- **Layer 4 (transport)** — Distributes based on TCP connections. Fast and
  lightweight, but blind to individual gRPC requests within a connection. For
  gRPC, L4 load balancing results in uneven request distribution because
  multiple RPCs share one connection.
- **Layer 7 (application)** — Understands HTTP/2 and can distribute individual
  gRPC requests across backends. The load balancer terminates the client's
  HTTP/2 connection and opens separate connections to each backend, routing each
  RPC independently.

For gRPC workloads, Layer 7 load balancing is almost always the right choice.
The additional processing overhead is minimal compared to the cost of sending
all traffic to a single backend.

### Approaches to gRPC Load Balancing

There are three main strategies, each suited to different architectures:

**Proxy-based (server-side) load balancing** places a Layer 7 proxy between
clients and backends. The proxy terminates the client's HTTP/2 connection and
distributes individual RPCs. Envoy, Linkerd, and NGINX all support this pattern.
This is the simplest approach and works well for most deployments.

**Client-side load balancing** uses gRPC's built-in name resolution and load
balancing APIs. The client discovers backend instances (typically via DNS or a
service registry) and distributes RPCs directly, without a proxy in the path.
This eliminates the proxy hop but adds complexity to client configuration.

**Service mesh** deploys a sidecar proxy alongside each service instance. The
sidecar handles L7 load balancing, retries, circuit breaking, and mTLS
transparently. Istio (with Envoy sidecars) and Linkerd are the most common
service mesh options for gRPC workloads.

For more on gateway-level traffic management patterns, see
[API Gateway Traffic Management: Routing and Load Balancing](/learning-center/api-gateway-traffic-management-routing-load-balancing).

## gRPC Health Checking and Service Discovery

gRPC defines its own
[health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md),
separate from HTTP health checks. An API gateway or load balancer that manages
gRPC backends needs to speak this protocol to determine which instances are
healthy.

### The gRPC Health Checking Protocol

The protocol defines a `grpc.health.v1.Health` service with a `Check` RPC. The
`ServingStatus` enum includes four values: `UNKNOWN`, `SERVING`, `NOT_SERVING`,
and `SERVICE_UNKNOWN` (used by the `Watch` RPC). In practice, the `Check` RPC
returns one of two statuses for known services:

- **SERVING** — The service is healthy and accepting traffic.
- **NOT_SERVING** — The service is unhealthy and should be removed from the load
  balancing pool.

If the requested service name is not registered, the server returns a
`NOT_FOUND` gRPC status.

Unlike HTTP health checks (where you hit a `/healthz` endpoint and check for a
200 status code), gRPC health checks require a client that can make gRPC calls.
This means your gateway or load balancer must support the gRPC health checking
protocol natively.

### Kubernetes Integration

Kubernetes has supported native gRPC health probes since version 1.24 (beta,
enabled by default), with the feature graduating to GA in version 1.27. You can
configure liveness, readiness, and startup probes that call the gRPC health
service directly:

```yaml
livenessProbe:
  grpc:
    port: 50051
    service: "my-service"
  initialDelaySeconds: 10
  periodSeconds: 10
```

For older Kubernetes versions, the
[grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe) utility
provides the same functionality as a sidecar or init container. The Kubernetes
Gateway API also added `GRPCRoute` as a stable resource in v1.1, giving gRPC
services first-class routing support alongside HTTP routes.

## Authentication and Authorization for gRPC

Authentication at the gateway is one of the strongest reasons to use a gRPC API
gateway. Without it, every service must implement its own credential validation
— leading to inconsistent security, duplicated logic, and a larger attack
surface.

### Common Authentication Patterns

**API key authentication** — The gateway validates an API key (typically in a
header or metadata field) before forwarding the request to the backend. This is
the simplest approach for external APIs and works identically for REST and gRPC
traffic when the gateway handles both.

**JWT validation** — The gateway verifies a JSON Web Token's signature, expiry,
and claims at the edge. Valid tokens are forwarded (often with claims extracted
into gRPC metadata), and expired or invalid tokens are rejected before they
reach your services.

**mTLS (mutual TLS)** — Both the client and server present certificates for
mutual authentication. This is common for service-to-service communication where
both parties are within the same trust domain. The gateway can terminate mTLS at
the edge and use internal mTLS or plaintext to communicate with backends.

**OAuth 2.0 token introspection** — The gateway validates bearer tokens by
introspecting them against an authorization server. This is common in enterprise
environments where a centralized identity provider manages access tokens.

### gRPC Metadata for Identity Propagation

Once the gateway authenticates a request, it needs to pass the authenticated
identity to the backend service. In gRPC, this is done through
[metadata](https://grpc.io/docs/guides/metadata/) — key-value pairs attached to
each RPC call, analogous to HTTP headers. The gateway extracts identity
information (user ID, scopes, roles) from the validated credential and injects
it as gRPC metadata that backends can trust without re-validating.

## gRPC Observability Through the Gateway

The gateway is the natural instrumentation point for gRPC traffic. Because every
request passes through it, you get comprehensive visibility without modifying
your application code.

### Distributed Tracing

gRPC supports distributed tracing through
[OpenTelemetry](https://opentelemetry.io/), which provides standardized trace
propagation across services. The gateway can:

- Inject trace context into incoming requests that lack it
- Propagate existing trace headers to backend gRPC calls
- Record spans for gateway-level processing (authentication, rate limiting,
  transcoding)

This gives you end-to-end visibility into request latency, from the client
through the gateway to the backend service and back.

### Metrics

Key gRPC metrics to capture at the gateway include:

- **Request rate** — RPCs per second, broken down by service and method
- **Error rate** — Percentage of RPCs returning non-OK gRPC status codes
- **Latency distribution** — P50, P95, and P99 latency for each method
- **Active streams** — Number of concurrent gRPC streams (important for
  streaming RPCs)

These metrics map to the
[RED method](https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/)
(Rate, Errors, Duration) that is standard for microservices monitoring.

### Logging

The gateway can log gRPC request metadata (method, service, status code,
latency) without inspecting the binary protobuf payload. For debugging, some
gateways support logging the decoded JSON representation of protobuf messages,
though this adds overhead and should be used selectively.

## gRPC-Web and Browser Support

Browsers cannot make native gRPC calls because the Fetch API does not expose
HTTP/2 framing at the level gRPC requires. The
[gRPC-Web](https://github.com/grpc/grpc-web) protocol solves this by defining a
compatibility layer that works over HTTP/1.1 and HTTP/2 without requiring
low-level frame access.

### How gRPC-Web Works

gRPC-Web modifies the standard gRPC protocol in several ways:

- It supports both HTTP/1.1 and HTTP/2 transports
- gRPC trailers are sent in the response body instead of HTTP/2 trailing headers
  (which browsers cannot access)
- A proxy translates between the gRPC-Web wire format and standard gRPC

The client-side library (`grpc-web` on npm) handles serialization and
deserialization of Protocol Buffer messages and manages the gRPC-Web framing.
Server-side, a proxy (most commonly Envoy) receives gRPC-Web requests and
forwards them as standard gRPC to your backend services.

### Browser Streaming Limitations

gRPC-Web supports server-side streaming (the server sends a stream of messages
to the client), but client-side streaming and bidirectional streaming are not
supported in browsers. This is a fundamental limitation of browser APIs — the
Fetch API specification includes streaming request bodies, but browser vendors
have not implemented this feature.

If your application requires bidirectional streaming from the browser, consider
using WebSockets alongside gRPC-Web, or evaluate
[ConnectRPC](https://connectrpc.com/) — a newer protocol family that offers
enhanced browser support with gRPC-compatible backends.

### gRPC-Web Proxies

Several proxies support the gRPC-Web protocol:

- **Envoy** — The official default proxy for gRPC-Web, with built-in support
  that requires minimal configuration. New gRPC-Web features are implemented in
  Envoy first.
- **grpc-web Go proxy** — A lightweight alternative for Go applications.
- **Apache APISIX** — Includes a gRPC-Web plugin for environments already using
  APISIX.
- **NGINX** — Supports native gRPC proxying but does not have built-in gRPC-Web
  transcoding. A separate proxy like Envoy is needed for gRPC-Web support.

## Performance Considerations

gRPC is fast by default, but the gateway layer introduces trade-offs that you
should understand.

### Protobuf Serialization Overhead

When the gateway performs transcoding (converting between JSON and protobuf), it
adds serialization and deserialization overhead. For high-throughput APIs, this
can be significant:

- **JSON → Protobuf** — The gateway parses JSON, validates field types, and
  encodes the protobuf binary format. This is more expensive than passthrough
  proxying.
- **Protobuf → JSON** — The gateway decodes the binary protobuf response and
  produces JSON. This is typically faster than the reverse because protobuf
  decoding is simpler than JSON parsing.

If your clients can speak gRPC natively, passthrough proxying (no transcoding)
eliminates this overhead entirely.

### HTTP/2 Connection Management

gRPC's use of HTTP/2 creates specific performance considerations at the gateway:

- **Connection pooling** — The gateway should maintain multiple HTTP/2
  connections to each backend to avoid bottlenecking on a single connection's
  flow control window.
- **Max concurrent streams** — HTTP/2 limits the number of concurrent streams
  per connection (typically 100–250). The gateway should open additional
  connections when this limit is reached.
- **Keep-alive and idle timeouts** — Long-lived gRPC connections can be silently
  dropped by intermediate network devices. Configure HTTP/2 keep-alive pings at
  the gateway to detect and recover from broken connections.

### Streaming Performance

Streaming RPCs (server streaming, client streaming, and bidirectional streaming)
hold connections open for extended periods. The gateway must:

- Avoid buffering entire streams in memory — process messages incrementally
- Support backpressure so a slow consumer does not overwhelm the gateway
- Handle stream cancellation gracefully when clients disconnect

## Choosing a gRPC API Gateway

The right gateway depends on your architecture, team expertise, and operational
requirements. Here is how the major options compare for gRPC workloads.

### Envoy Proxy

Envoy is the de facto standard for gRPC proxying. Originally built at Lyft, it
provides first-class support for gRPC including JSON transcoding, gRPC-Web
bridging, health checking, and L7 load balancing. Envoy is the data plane for
service meshes like Istio and powers projects like Envoy Gateway.

**Strengths**: Best-in-class gRPC support, high performance, mature ecosystem.

**Trade-offs**: Requires Kubernetes and a control plane (Istio or Envoy Gateway)
for production use. Protobuf-based configuration has a steep learning curve. No
built-in developer portal, API key management, or API monetization.

### Zuplo

[Zuplo](https://zuplo.com) is an edge-native API gateway that supports HTTP/2
and can proxy gRPC traffic across its network of over 300 global data centers.
Zuplo's strength for gRPC architectures is in managing the REST-facing side of a
gRPC system — handling authentication, rate limiting, and developer experience
for the external API that fronts your gRPC backends.

**Strengths**: Fully managed with zero infrastructure to operate. TypeScript
programmability for custom request handling. Built-in
[API key management](https://zuplo.com/docs/articles/api-key-management),
[rate limiting](https://zuplo.com/docs/policies/rate-limit-inbound), and
automatic
[developer portal generation](https://zuplo.com/docs/articles/developer-portal).
Deploys to 300+ edge locations with sub-50ms latency globally.

**Trade-offs**: Does not provide built-in gRPC-JSON transcoding or gRPC-Web
bridging at the gateway layer. If your primary protocol is gRPC end-to-end, a
proxy like Envoy is the stronger choice for the gRPC data plane.

### Kong Gateway

Kong supports gRPC proxying and gRPC-Web through dedicated plugins. It can
terminate gRPC traffic and apply rate limiting, authentication, and logging
policies.

**Strengths**: Large plugin ecosystem, supports multiple protocols alongside
gRPC, available as both open-source and enterprise.

**Trade-offs**: gRPC support requires separate plugin configuration. Lua-based
plugins are less familiar to most development teams. Self-hosted deployments
need PostgreSQL (Cassandra support was removed in Kong 3.4).

### Traefik

Traefik supports gRPC proxying over HTTP/2 and integrates with Kubernetes
Ingress and the Gateway API (including `GRPCRoute`). It provides automatic
service discovery and certificate management.

**Strengths**: Easy Kubernetes integration, automatic Let's Encrypt
certificates, built-in support for `GRPCRoute`.

**Trade-offs**: Limited gRPC-specific features compared to Envoy. No built-in
transcoding — you need a separate grpc-gateway or Envoy sidecar for gRPC-to-REST
translation.

### NGINX

NGINX supports gRPC proxying via its `ngx_http_grpc_module` and is widely
deployed and well-understood by operations teams.

**Strengths**: Ubiquitous, high performance, extensive documentation.

**Trade-offs**: No built-in gRPC-JSON transcoding. Configuration is file-based
and less developer-friendly. No developer portal or API key management.

### The Hybrid Approach

Many production architectures combine multiple gateways. A common pattern:

- **Envoy** handles gRPC-specific concerns — transcoding, gRPC-Web, L7 load
  balancing, and health checking for internal service-to-service traffic.
- **Zuplo** sits at the edge, managing the external REST API that consumers
  interact with — handling authentication, rate limiting, developer portal, and
  API key lifecycle.

This gives you best-in-class gRPC support internally and a developer-friendly,
fully managed API experience externally. For a deeper comparison of Envoy and
Zuplo, see
[Zuplo vs Envoy Proxy](https://zuplo.com/api-gateways/envoy-alternative-zuplo).

## Best Practices

### Start with Passthrough Proxying

If your clients can speak gRPC natively, proxy gRPC traffic through the gateway
without transcoding. This preserves gRPC's performance benefits and avoids the
serialization overhead of JSON conversion. Add transcoding only for clients that
genuinely need REST/JSON.

### Use L7 Load Balancing from Day One

Do not wait for uneven load distribution to become a production incident.
Configure Layer 7 load balancing for gRPC traffic as part of your initial
deployment. In Kubernetes, this means using an L7-aware ingress controller or
service mesh rather than relying on `kube-proxy`.

### Centralize Authentication at the Gateway

Let the gateway handle credential validation (API keys, JWTs, mTLS) and pass
authenticated identity to backends via gRPC metadata. This keeps your services
focused on business logic and ensures consistent security across your entire API
surface.

### Implement Health Checks with the gRPC Protocol

Use the standard gRPC health checking protocol for your services, not HTTP
health endpoints. This ensures your gateway and load balancer can accurately
assess service health using the same protocol as your production traffic.

### Monitor the Four Golden Signals

Track request rate, error rate, latency, and saturation at the gateway for every
gRPC service and method. These metrics are your first line of defense against
performance regressions and capacity issues.

### Plan Your Transcoding Strategy

If you need both gRPC and REST interfaces, decide early whether to use
gateway-level transcoding (Envoy, grpc-gateway) or maintain separate REST and
gRPC service implementations. Gateway-level transcoding reduces code duplication
but adds a dependency on annotation correctness.

## Further Reading

Ready to manage your API gateway for gRPC and REST workloads?
[Sign up for Zuplo](https://portal.zuplo.com/signup) to deploy a fully managed
API gateway in minutes, or explore these related resources:

- [REST or gRPC? A Guide to Efficient API Design](/learning-center/rest-or-grpc-guide)
  — Understand the trade-offs between REST and gRPC for your architecture
- [gRPC API Gateway: Bridging the Gap Between REST and gRPC](/blog/grpc-api-gateway)
  — A hands-on guide to building a gRPC-to-REST bridge with code generation
- [API Gateway Patterns](/learning-center/api-gateway-patterns) — Common
  architectural patterns for API gateways
- [Edge-Native API Gateway Architecture](/learning-center/edge-native-api-gateway-architecture)
  — How edge-native gateways reduce latency for global APIs