Multi-Agent Systems · ~8 min
Agents are stateful, so you can't just flip the switch to a new version. Keep N versions running at once; new sessions take the latest, old ones drain where they started.
Stateless HTTP services cut over atomically — swap the load balancer, drain connections, done. Agents can't, because their state lives in long sessions. Rainbow keeps every version running until its sessions finish.
Agent behavior depends on four independently versioned layers — a change to any one can alter output:
| Layer | Examples |
|---|---|
| Code | harness, orchestrator loop |
| Model | the model version in use |
| Prompts | system prompt, few-shot examples |
| Tools | definitions, schemas, endpoints |
A deployment version is the tuple of all four. Rollback means reverting to a known-good tuple, not just the code. Which changes need a rainbow deploy? The high-risk ones whose behavior is hard to predict:
| Change | Rainbow? |
|---|---|
| Model version swap | Yes — behavior shifts unpredictably |
| System prompt rewrite | Yes — cascading behavioral changes |
| Tool definition change | Yes — breaks existing tool-call patterns |
| Bug fix in harness code | Usually no — deterministic, testable |
Compare the new version against the baseline before each percentage increase — accuracy, error rate, latency, cost per session, hallucination rate, user feedback. Typical progression: 5% → 25% → 50% → 100%, advancing only when metrics hold steady. Rollback is a router change pointing new traffic at the previous version — old versions are still draining, so it's near-instant with no redeployment. That's the primary advantage over blue-green, where the old environment may already be torn down.
If agents run multi-hour or multi-day tasks (deep research, long pipelines), old versions may never fully drain — each deploy adds another live version consuming infrastructure, and the fleet fragments indefinitely without an explicit session timeout or forced-drain policy. And for short-lived stateless agents (single-turn Q&A, inline completions), plain blue-green is simpler and equally safe. Rainbow's value scales with session duration.
Retrieval practice — recall, don't peek
Question 1Agents can't blue-green cleanly mainly because they are…
Question 2Rainbow's advantage over canary is that it…
Question 3A deployable agent version is best described as a tuple of…
Question 4Rollback in a rainbow deploy is near-instant because…
Question 5 · spaced recall from Lesson 7Framework prompts resist cascade failures because they encode…