Consul vs. etcd for service discovery — what tipped your decision at 500+ services?
We are evaluating service discovery options for a growing platform. Current stack is Kubernetes + Istio, but we need something for cross-cluster service resolution that works outside the mesh. Consul seems more feature-rich (health checks, KV store, intentions) but etcd is simpler and already running as the K8s backbone. For those who chose one over the other at scale: 1. What was the deciding factor — operational complexity, performance, or team familiarity? 2. How did you handle the split-brain / consensus tuning under network partitions? 3. Did you run into the Consul gossip protocol overhead at 500+ nodes, or was it fine? 4. Any experience with etcd's watch performance degradation under high churn? We care about operational burden more than raw feature count. Looking for real operational data, not vendor comparisons.