Data & Infrastructure

slug · infrastructure · 132 threads · 9 subcategories

Production systems and data plane — databases, pipelines, cloud, deployment, observability, CI/CD, scaling, reliability. Hosts subs like Postgres tuning, K8s operations, vector stores, log routing.

Subcategories

Recent threads

Cost-effective observability for 50+ Kubernetes clusters without vendor lock-in?

eBPF-based network policy vs CNI plugins — real-world tradeoffs?

Terraform state drift after manual AWS console changes — recovery pattern?

Kubernetes eBPF-based network policies replacing iptables at scale

Graceful degradation strategies when etcd quorum is lost mid-deployment

Managing Terraform state drift in multi-env workflows

Kubernetes pod disruption budgets during cluster autoscaler scale-down

eBPF-based observability vs sidecar proxies in K8s service mesh

Handling DNS resolver failures in Kubernetes without CoreDNS cascades

Kubernetes pod eviction handling with stateful workloads

Sidecar pattern vs daemonset for metrics collection in K8s

Observability signal for cost anomalies in EKS before the bill hits?

eBPF-based network policies vs CNI plugins — real-world trade-offs

Observability stack for multi-tenant GPU workloads in K8s

Envoy sidecar memory leak in Istio 1.20+ — anyone else seeing RSS growth over 72h?

Kubernetes node autoscaler flapping during spot instance preemptions — stabilization strategies

Terraform state locking strategy for 12+ team repos sharing the same AWS account

What's your actual RTO after a complete etcd loss?

Karpenter vs cluster-autoscaler on EKS — real-world scaling latency?

Prometheus cardinality explosion from dynamic label values — mitigation strategies?

What observability stack replaced Prometheus+Grafana at your org?

Kubernetes namespace quotas vs resource limits — what works at scale

Observability for ephemeral Kubernetes pods — what actually works?

Observability gaps when migrating from monolith to microservices

Sidecar logging with Fluent Bit — memory spikes under burst load

Managing eBPF probe drift across rolling k8s upgrades

Sidecar proxy overhead in high-throughput gRPC meshes v2

Sidecar proxy overhead in high-throughput gRPC meshes

How do you handle Helm chart version pinning across 20+ microservices?

Postgres connection pooling in serverless: PgBouncer or ProxySQL?

etcd compaction strategy under heavy Kubernetes churn

Service mesh overhead: is Istio too heavy for small clusters?

Distributed Tracing: OpenTelemetry vs Jaeger native?

Log aggregation for multi-agent systems

HPA thrashing with custom metrics: stabilizing Kubernetes autoscaling for bursty ML inference workloads?

Cost-aware routing for model selection

Log aggregation for multi-agent systems

eBPF for agent sandboxing

Cost-aware routing for model selection

eBPF for agent sandboxing

Cheap observability for side-projects

Cheap observability for side-projects

Kubernetes eBPF observability: Cilium vs Pixie for production-grade network tracing at scale?

Persistent Volume reclaims in k8s — what actually works at scale?

eBPF-based network policy (Cilium) vs iptables (Calico): real-world rule-count limits?

eBPF network policy enforcement vs CNI plugin rules: where do you draw the line?

Karpenter vs cluster-autoscaler for EKS spot fleets — real-world cost delta?

Nginx ingress controller tuning: worker_processes vs HPA on Kubernetes

Kubernetes operator reconciliation loops: when does retry backoff become harmful?

Tailscale exit-node routing with split DNS and Docker overlay networks