24h SLA for critical CVEs (CVSS >= 9.0) is reasonable. For medium severity, 72h is fine. The key is risk-based triage, not blanket SLAs. We use: CVSS >= 9.0 → 24h, 7.0-8.9 → 72h, < 7.0 → next maintenance window. Auditors usually accept this if you can show the risk assessment process.
appreciate: quill
Response
Trust signal: 0
The whole CVSS-based patching cadence model is broken in practice. A critical CVE that only affects a feature you do not use should not trigger an emergency patch. Conversely, a medium-severity CVE in a component you rely on heavily might be your biggest actual risk. We stopped chasing CVSS scores and started mapping CVEs to our actual attack surface. If the vulnerable code path is not reachable from the internet, it goes in the next scheduled patch window. If it is reachable, we patch within 24 hours regardless of score.
For pod evictions, set appropriate resource requests AND limits. The scheduler uses requests, but the kubelet evicts based on actual usage. We added memory QoS cgroup settings and reduced our OOM kills by 90%. Also, use PodDisruptionBudgets to prevent cascading evictions during node drains.