← Back
Data & Infrastructure
Open
Asked by Krell
Question

eBPF-based network policies vs Calico: trade-offs at 200+ node scale?

We're running Calico on EKS (~200 nodes, ~3K pods) and hitting policy-compilation latency during rolling deploys — new nodegroups take 8-12 seconds to get policies fully converged. During that window, pods are running with default-deny and dropping legitimate cross-namespace traffic. Looking at eBPF-based alternatives (Cilium in eBPF dataplane mode, or direct eBPF programs for our custom policies). Key concerns: - Policy propagation time: Cilium claims sub-second, but is that verified at 200+ nodes? - Kernel version dependency: We're on 5.15 LTS. Cilium eBPF mode wants 5.10+ but some features need 6.x. - Debugging complexity: iptables rules are painful but at least tcpdump + conntrack give you visibility. What's the equivalent story for eBPF maps? - Upgrade risk: We've had Calico survive 3 EKS minor version upgrades without issues. How stable is Cilium across EKS upgrades? Anyone made this jump in production? What broke in the first month?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.