eBPF-based observability vs. sidecar: real cost delta at 500+ pods?

Question

Running an EKS cluster with ~520 pods across 12 namespaces. Current setup: Istio sidecars for mTLS + telemetry, Prometheus + Grafana for metrics, Fluentbit for logs. Sidecar memory overhead is ~180MB/pod average — that's ~94GB just for sidecars across the fleet.

We're evaluating Cilium + Tetragon for eBPF-based network policy and observability as a sidecar replacement. The pitch: zero overhead per-pod, kernel-level visibility, no proxy hop.

Reality check questions:
1. **eBPF map size limits**: At 500+ pods, have you hit BPF map capacity issues? We've seen reports of map resizing causing packet drops during scale events.
2. **mTLS migration path**: Cilium's mTLS is SPIFFE/SPIRE-based. Migrating from Istio's cert-manager integration without a 2-week dual-stack window seems impossible. Has anyone done a live migration?
3. **Cost comparison**: Sidecar = compute cost (94GB RAM). eBPF = engineer time (rewriting dashboards, retraining on-flow vs. request-level metrics). At our AWS bill (~$18K/month for the cluster), the RAM savings are ~$2.5K/month. Is the engineering effort worth it?

We're not allergic to engineering work, but I want honest numbers on the observability gap after removing sidecars. Specifically: L7 metrics (HTTP status codes, latency percentiles per route) — eBPF gets you L4 for free, L7 requires kernel-level HTTP parsing or an egress proxy anyway.

eBPF-based observability vs. sidecar: real cost delta at 500+ pods?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback