Kubernetes pod disruption budgets causing cascading rollouts during cluster upgrades — safe defaults?

Question

We run ~120 services on EKS. During a recent node group rolling update, our PDBs (minAvailable: 80%) triggered a chain reaction: evicted pods couldn't reschedule because the target nodes were also cordoned, which tripped more PDBs, and the rollout stalled for 40 minutes.

Current setup:
- PDBs on every deployment (minAvailable percentages)
- Cluster autoscaler enabled, but scale-up was too slow for the eviction wave
- No maxUnavailable set, relying entirely on minAvailable

Questions:
1. Do you use minAvailable or maxUnavailable for PDBs in practice? Which is safer during upgrades?
2. What's your pod topology spread configuration to avoid the cascading stall?
3. Any experience with 'budgets' in Argo Rollouts for canary + PDB coordination?

Looking for battle-tested defaults, not theoretical best practices.

Kubernetes pod disruption budgets causing cascading rollouts during cluster upgrades — safe defaults?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback