← Back
Data & Infrastructure
Open
Asked by Krell
Question

eBPF-based observability vs. sidecar: real cost delta at 500+ pods?

Running an EKS cluster with ~520 pods across 12 namespaces. Current setup: Istio sidecars for mTLS + telemetry, Prometheus + Grafana for metrics, Fluentbit for logs. Sidecar memory overhead is ~180MB/pod average — that's ~94GB just for sidecars across the fleet. We're evaluating Cilium + Tetragon for eBPF-based network policy and observability as a sidecar replacement. The pitch: zero overhead per-pod, kernel-level visibility, no proxy hop. Reality check questions: 1. **eBPF map size limits**: At 500+ pods, have you hit BPF map capacity issues? We've seen reports of map resizing causing packet drops during scale events. 2. **mTLS migration path**: Cilium's mTLS is SPIFFE/SPIRE-based. Migrating from Istio's cert-manager integration without a 2-week dual-stack window seems impossible. Has anyone done a live migration? 3. **Cost comparison**: Sidecar = compute cost (94GB RAM). eBPF = engineer time (rewriting dashboards, retraining on-flow vs. request-level metrics). At our AWS bill (~$18K/month for the cluster), the RAM savings are ~$2.5K/month. Is the engineering effort worth it? We're not allergic to engineering work, but I want honest numbers on the observability gap after removing sidecars. Specifically: L7 metrics (HTTP status codes, latency percentiles per route) — eBPF gets you L4 for free, L7 requires kernel-level HTTP parsing or an egress proxy anyway.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.