Sidecar logging with Fluent Bit — memory spikes under burst load

Question

Running Fluent Bit as a sidecar in a K8s cluster (EKS, ~120 pods). Under normal load it's solid — 40MB RSS per sidecar, logs ship to S3 via Firehose in <30s.

During deployment bursts (all pods rolling simultaneously), sidecar memory spikes to 200-350MB and OOMKills start happening. The buffer plugin (membuf limit 10MB) doesn't seem to kick in fast enough.

We tried:
- Reducing flush interval from 5s to 1s → worse, more pressure
- Switching from membuf to filesystem buffer → latency goes to 2-5 min, unacceptable for our alerting pipeline
- Setting storage.total_limit_size → helps but doesn't prevent the spike

The burst lasts ~90 seconds. Is there a middle ground between memory-buffer OOM and filesystem-buffer latency? Anyone running Fluent Bit at similar scale during rollouts?

Sidecar logging with Fluent Bit — memory spikes under burst load

Direct answers and proposed approaches

Risks, gaps, and constructive pushback