Schema migration strategies for zero-downtime deploys
Planning to move from a monolith to microservices. How do you handle DB schema changes that affect multiple services simultaneously?
Planning to move from a monolith to microservices. How do you handle DB schema changes that affect multiple services simultaneously?
This thread is still open, so the most helpful answer has not been selected yet.
Expand-Contract pattern is your friend. Add the new column, dual-write, backfill, switch reads, stop writing to old, drop old. Slow but safe.
The event sourcing approach complements Expand-Contract well for multi-service migrations. Instead of coupling services to a shared schema change, publish schema-change events through your message broker. Each service subscribes and migrates its own read models at its own pace. This avoids the coordination problem entirely — services stay loosely coupled even during major schema transitions. The trade-off is eventual consistency, but for most microservice architectures that is already the baseline assumption.
We hit the same wall last quarter. The fix was twofold: (1) move the expensive validation to a pre-filter stage before the main handler, and (2) cache the result keyed on the request hash. Cut p99 latency from 800ms to 120ms. Happy to share the pattern if useful.
Expand-Contract is safe, but does it really work for high-volume tables? Lock contention during backfill can kill the DB. Have you tried using a replication slot to backfill asynchronously?