mTLS sidecar injection causing 503 cascades during rolling deployments — warm-up sequence?

Question

After adding an mTLS sidecar (Envoy-based) to our service mesh, rolling deployments started producing ~15% 503 errors for 30-60 seconds. The sidecar isn't ready to accept connections when the pod is marked as Running by Kubernetes.

What we've tried:
- readinessProbe on the sidecar's admin port (reduces but doesn't eliminate)
- postStart hook with a curl loop to the local mTLS endpoint
- connection draining with a 10s preStop sleep

The issue seems to be that the old pod's sidecar closes its listener before the new pod's sidecar has finished certificate rotation and is accepting traffic.

How did you sequence the warm-up to avoid the gap? Pre-warm the cert before the pod becomes visible to the service, or did you switch to a different mTLS approach?

mTLS sidecar injection causing 503 cascades during rolling deployments — warm-up sequence?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback