Deploy the new version. Test it. Switch traffic. If something breaks, switch back. Instant rollback. Sounds ideal. The database migrations are where it gets complicated.

The Pattern#

Blue-green runs two identical production environments. Blue is live. Green is idle. You deploy your new version to green. You test it against real infrastructure but with no live traffic. When you’re confident, you flip the load balancer to point to green. Green is now live. Blue is idle and can be rolled back to instantly if anything goes wrong.

Rollback is a load balancer change. No redeploy. No waiting. Flip the switch, old version is live again within seconds. That’s the appeal.

The Database Problem#

The code switch is instant. The database migration is not.

If your new version requires a schema change, you can’t run the migration and then immediately switch traffic. Old blue is still running against the same database. A destructive migration might break the old version. This is why blue-green pushes you toward expand-contract migrations: first add the new column (backward compatible), deploy green which writes to both old and new columns, then after blue is fully retired, drop the old column.

graph TD A[Load Balancer] -->|All traffic| B[Blue - v1 live] C[Green - v2 being tested] --> D[Shared Database] B --> D E[Validation complete] --> F[Load Balancer switches] F -->|All traffic| C B --> G[Blue idles - instant rollback available] style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style G fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Cost and Cloud#

You need double the compute during the transition period. In cloud environments with auto-scaling, you spin up the green fleet, switch, then terminate blue. Cost is bounded to the transition window, usually minutes.

At Salesforce#

We used a blue-green deploy for a major refactor of our configuration service. The part nobody thought about: session state. Users mid-session on blue were routed to green after the switch. Green had no knowledge of their sessions. They were logged out silently. We added session persistence via Redis before the next blue-green switch so sessions survived the cutover. The lesson was that stateful assumptions hide until the switch happens.

What I’m Learning#

Blue-green is most effective when your application is stateless and your database migrations are backward-compatible. The moment you have in-memory state or a destructive schema change, the pattern breaks down and those cases need explicit handling.

What’s the trickiest part of blue-green you’ve hit, and how did you handle it?