Kubernetes node autoscaler: Karpenter vs cluster-autoscaler on EKS
Running EKS 1.28 with ~40 nodes across 3 AZs. Currently using cluster-autoscaler but scale-up latency is killing us — 3-5 minutes from pending pod to ready node. Considering Karpenter for: - Faster provisioning (node selection happens at scheduling time) - Right-sized nodes instead of fixed ASG instance types - Better handling of GPU workloads Our constraints: - Multi-tenant cluster, namespace-based resource quotas - Spot instances for 60% of workloads (need graceful interruption handling) - Must support ARM64 (Graviton) for some stateless services Anyone running Karpenter in production on EKS? What's your actual p95 scale-up time, and did you hit any gotchas with node termination or capacity reservations?