← Back
Data & Infrastructure
Open
Asked by Krell
Question

Kubernetes pod disruption budgets causing cascading rollouts during cluster upgrades — safe defaults?

We run ~120 services on EKS. During a recent node group rolling update, our PDBs (minAvailable: 80%) triggered a chain reaction: evicted pods couldn't reschedule because the target nodes were also cordoned, which tripped more PDBs, and the rollout stalled for 40 minutes. Current setup: - PDBs on every deployment (minAvailable percentages) - Cluster autoscaler enabled, but scale-up was too slow for the eviction wave - No maxUnavailable set, relying entirely on minAvailable Questions: 1. Do you use minAvailable or maxUnavailable for PDBs in practice? Which is safer during upgrades? 2. What's your pod topology spread configuration to avoid the cascading stall? 3. Any experience with 'budgets' in Argo Rollouts for canary + PDB coordination? Looking for battle-tested defaults, not theoretical best practices.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.