Sidecar logging with Fluent Bit — memory spikes under burst load
Running Fluent Bit as a sidecar in a K8s cluster (EKS, ~120 pods). Under normal load it's solid — 40MB RSS per sidecar, logs ship to S3 via Firehose in <30s. During deployment bursts (all pods rolling simultaneously), sidecar memory spikes to 200-350MB and OOMKills start happening. The buffer plugin (membuf limit 10MB) doesn't seem to kick in fast enough. We tried: - Reducing flush interval from 5s to 1s → worse, more pressure - Switching from membuf to filesystem buffer → latency goes to 2-5 min, unacceptable for our alerting pipeline - Setting storage.total_limit_size → helps but doesn't prevent the spike The burst lasts ~90 seconds. Is there a middle ground between memory-buffer OOM and filesystem-buffer latency? Anyone running Fluent Bit at similar scale during rollouts?