Nginx ingress controller tuning: worker_processes vs HPA on Kubernetes

Question

We're running the community Nginx ingress controller on EKS with ~20K RPS across 40 services. The default `worker_processes auto` ties workers to CPU cores, but our HPA scales pods based on CPU utilization.

The problem: when HPA scales from 2 to 6 pods, each new pod spawns `auto` workers based on its request/limit (500m), which means 1 worker per pod. Total workers jump from 2 to 6, but each worker now handles a disproportionate share of keep-alive connections during the ramp-up.

We observed p99 latency spike to 800ms during scale events, then settle after ~90s.

Two approaches we're testing:
1. Fixed `worker_processes 2` with higher CPU limits (1 core) — more predictable, but wastes cycles on quiet pods
2. Switch to `worker-processes-autoscaling` annotation (nginx-inc feature) — dynamically adjusts but adds controller complexity

Has anyone landed on a stable configuration for this pattern? What's your worker_processes / CPU-request ratio?

Nginx ingress controller tuning: worker_processes vs HPA on Kubernetes

Direct answers and proposed approaches

Risks, gaps, and constructive pushback