When you're releasing new API versions, you want confidence that changes work before they hit all your users. Canary routing lets you test new backends with a small subset of traffic first: internal employees, beta testers, or a percentage of requests. If something breaks, only the canary group is affected.
This post covers what canary routing is, when to use it, and how to implement it on your API using Zuplo's custom policies.
- Rolling out new API versions and want employees to test them first
- Running a beta program with select customers
- Gradually shifting traffic to new infrastructure
- Dogfooding features internally before public release
What Is Canary Routing?
Canary routing directs a subset of API traffic to a different backend than your main production environment. The name comes from the "canary in a coal mine" concept: if something goes wrong, you detect it early with a small group rather than affecting everyone.
Unlike blue-green deployments (which switch all traffic at once), canary routing lets you:
- Test with real production traffic patterns
- Catch issues before they reach all users
- Roll back instantly by removing the routing rule
- Gradually increase exposure as confidence grows
Common Canary Routing Strategies
1. User-Based Routing
Route specific users to the canary backend based on their identity. This works well for:
- Internal employees testing new features
- Beta program participants
- Premium customers getting early access
The user's email, ID, or another identifier determines which backend handles their request.
2. Header or Query Parameter Routing
Let users opt into the canary experience by passing a header (x-stage: canary)
or query parameter (?stage=canary). This is useful for:
- QA teams testing specific environments
- Developers debugging against staging
- Support staff reproducing customer issues
3. Percentage-Based Routing
Route a percentage of all traffic to the canary backend. A 10% canary deployment means roughly 1 in 10 requests goes to the new version. This approach:
- Tests with realistic traffic distribution
- Requires no user-side changes
- Scales confidence gradually (start at 1%, move to 5%, then 10%, etc.)
Implementing Canary Routing with Zuplo
Zuplo's custom policies let you implement any of these strategies. The policy
runs before your request handler and can modify context.route.url to point at
different backends.
Here's an implementation that checks for a canary header, query parameter, or user identity:
Note that the code checks for environment.API_URL_CANARY before routing to canary. If the environment variable isn't set, requests fall back to production. The policy sets context.custom.backendUrl, which the URL Rewrite handler then uses to forward the request to the correct backend.
Policy Ordering
Your authentication policy should run before the canary routing policy. This
ensures request.user.sub is populated when the canary policy evaluates
user-based routing rules.
If you put canary routing first, request.user will be undefined and user-based
routing won't work.
Percentage-Based Canary Routing
For gradual rollout, you can route a percentage of traffic to the canary backend instead of relying on user lists or headers. Replace the routing logic with a hash-based approach:
The hash ensures the same client consistently hits the same backend. Without this, a user might flip between canary and production on consecutive requests, making debugging difficult.
You can combine this with user-based routing: check the CANARY_USERS list
first, then fall back to percentage-based routing for everyone else.
Configuration
Set up your environment variables in the Zuplo dashboard:
Then add the policy to your route configuration:
Testing Your Canary Setup
Once deployed, test each routing method:
To confirm which backend handled each request, add a response header like
X-Backend-Type: canary using the
Set Response Headers policy.
This makes it easy to verify routing during testing without checking logs.
Monitoring Your Canary Deployment
The logging in our policy emits structured data with each request: which backend handled it, why, and who made the request. You can use this to track traffic distribution between canary and production.
If you're using a logging provider like Datadog, you can create dashboards that
filter logs by backend:canary vs backend:production to compare traffic
distribution and error rates between the two backends.
Zuplo supports many logging providers including Datadog, New Relic, Dynatrace, and Google Cloud Logging.
Best Practices for Canary Routing
Start small. Begin with a handful of volunteer testers, expand to engineering, then all employees, before considering percentage-based routing for external users.
Log routing decisions. Include the routing reason (header, user, percentage) in your logs so you can debug issues quickly.
Have a rollback plan. If the canary backend has problems, you should be able
to route all traffic back to production immediately by setting
CANARY_PERCENTAGE=0 or removing the canary policy.
Use sticky sessions for percentage routing. The hash-based approach ensures users don't flip between backends mid-session, which would make debugging nearly impossible.
Security Considerations
The routing strategies above have different security profiles.
User-based routing is the most secure since it requires authentication and
an explicit allowlist. Only users in your CANARY_USERS list can access the
canary backend.
Header and query parameter routing is open by default. Anyone who discovers
?stage=canary or the x-stage header can access your canary backend. This is
fine for:
- Public beta programs where you want easy opt-in
- Canary backends that are stable but just not fully rolled out
If your canary backend contains unreleased features, incomplete functionality, or could expose sensitive data, require authentication before honoring canary indicators:
Percentage-based routing is transparent to users because they don't control which backend they hit. However, ensure your canary backend has the same security policies as production.
When Not to Use Canary Routing
Canary routing does add complexity, so you may want to skip it if:
-
Your API is stateless and easily reversible. If you can deploy, observe, and roll back in minutes with no user impact, a simpler deploy-and-monitor approach may suffice.
-
You have comprehensive staging environments. If your staging environment accurately mirrors production traffic patterns, you may catch issues there instead.
-
Breaking changes require client updates. Canary routing works best for backend changes that are transparent to callers. If your new version has a different request/response schema, clients need to update anyway, so gradual rollout won't help.
-
You're testing database migrations. Canary routing splits traffic, but if both backends share a database, a bad migration affects both. Use feature flags or database-level strategies instead.
Consider canary routing when you need confidence that infrastructure or behavioral changes work at scale before full rollout, not for every deployment.
Why Use Zuplo for Canary Routing?
You have options for implementing canary routing that you may have already heard of. Here's how they compare:
| Approach | User-based routing | Percentage routing | Infrastructure required |
|---|---|---|---|
| Feature flags (LaunchDarkly, Split) | Requires SDK integration | Yes | SDK in your application |
| Load balancer (AWS ALB, CloudFlare) | No | Yes | Load balancer configuration |
| Zuplo | Yes, via auth context | Yes | None beyond existing gateway |
Feature flags are great for toggling features within your code, but routing to entirely different backends means adding routing logic on top of flag evaluation. Load balancers can do weighted routing, but they lack context about who's making the request: you can route 10% of traffic, but you can't easily say "route employees to canary."
Zuplo sits at your API gateway layer, so you get access to request context (headers, auth, user identity) for intelligent routing decisions, edge deployment, and the same policy framework you're already using for auth, rate limiting, and validation. Custom policies are available on all Zuplo plans, including the free tier.
Try It Yourself
Canary Routing Example
A complete working example that implements user-based and percentage-based canary routing. Deploy directly to your Zuplo account or run locally.