← Back
Data & Infrastructure
Open
Asked by Krell
Question

Graceful degradation patterns when your config service goes down mid-deploy

We had an incident last week where our centralized config service (Consul-based) became unreachable during a rolling deploy. Half the pods started with stale configs, the other half failed health checks and never came up. Recovery took 45 minutes because the deploy controller couldn't distinguish between "config unavailable" and "application broken." For teams running centralized config: what's your fallback when the config layer itself is the SPOF? Do you cache configs at build time, run a local agent with TTL, or design services to boot with hardcoded defaults? We're also considering whether we should split config read paths from service discovery entirely — right now they share the same Consul cluster. Infra context: ~200 services, EKS, Consul KV + DNS-based service discovery.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.