Rainbow Deployments

Agents are stateful, so you can't just flip the switch to a new version. Keep N versions running at once; new sessions take the latest, old ones drain where they started.

Why this, for you: the moment you operate agents in production, shipping a new prompt or model is a risk, not a routine. A small change cascades into large behavioral shifts (Part 3, Lesson 7), and an atomic cutover breaks every in-flight session. Rainbow deployment is how you ship agent changes without that breakage — and how you roll back in seconds.

Stateless HTTP services cut over atomically — swap the load balancer, drain connections, done. Agents can't, because their state lives in long sessions. Rainbow keeps every version running until its sessions finish.

1 Why agents can't blue-green

Stateful execution — conversation context, tool state, and multi-step plans persist across long sessions.
Behavioral sensitivity — small prompt, tool, or model changes cascade into large behavioral shifts.
Expensive restarts — a forced restart loses accumulated context and wastes compute.

Blue-green assumes atomic cutover. Canary improves it with gradual traffic shifting but caps you at two concurrent versions. Rainbow removes the version ceiling: new sessions route to the latest version; existing sessions stay on whichever version started them and drain as they complete.

2 Version is a tuple, not a commit

Agent behavior depends on four independently versioned layers — a change to any one can alter output:

Layer	Examples
Code	harness, orchestrator loop
Model	the model version in use
Prompts	system prompt, few-shot examples
Tools	definitions, schemas, endpoints

A deployment version is the tuple of all four. Rollback means reverting to a known-good tuple, not just the code. Which changes need a rainbow deploy? The high-risk ones whose behavior is hard to predict:

Change	Rainbow?
Model version swap	Yes — behavior shifts unpredictably
System prompt rewrite	Yes — cascading behavioral changes
Tool definition change	Yes — breaks existing tool-call patterns
Bug fix in harness code	Usually no — deterministic, testable

3 Monitoring, rollback, and the drain trap

Compare the new version against the baseline before each percentage increase — accuracy, error rate, latency, cost per session, hallucination rate, user feedback. Typical progression: 5% → 25% → 50% → 100%, advancing only when metrics hold steady. Rollback is a router change pointing new traffic at the previous version — old versions are still draining, so it's near-instant with no redeployment. That's the primary advantage over blue-green, where the old environment may already be torn down.

Long-lived sessions never drain

If agents run multi-hour or multi-day tasks (deep research, long pipelines), old versions may never fully drain — each deploy adds another live version consuming infrastructure, and the fleet fragments indefinitely without an explicit session timeout or forced-drain policy. And for short-lived stateless agents (single-turn Q&A, inline completions), plain blue-green is simpler and equally safe. Rainbow's value scales with session duration.

↪ Your win: ship agent versions without breaking sessions

Run N versions concurrently — new sessions get the latest; existing sessions drain where they started.
Track the version tuple — code × model × prompt × tools; roll back the whole tuple, not just code.
Rainbow the high-risk changes — model, prompt, tool definitions; skip it for deterministic bug fixes.
Advance on steady metrics — 5 → 25 → 50 → 100%, watching accuracy, errors, cost, hallucination.
Set a drain policy for long sessions — or old versions never retire and the fleet fragments.

Retrieval practice — recall, don't peek

Question 1Agents can't blue-green cleanly mainly because they are…

Question 2Rainbow's advantage over canary is that it…

Question 3A deployable agent version is best described as a tuple of…

Question 4Rollback in a rainbow deploy is near-instant because…

Question 5 · spaced recall from Lesson 7Framework prompts resist cascade failures because they encode…

Ask me anything. Want a label-selector rollout for a real agent service, or a drain policy that keeps long-running sessions from pinning old versions forever? That closes Part 3. Next: the Capstone — the whole course as a decision table.