Prometheus cardinality explosion from high-dimensional metrics — how to decide what labels to keep?

Question

Prometheus scraping 200+ pods, each emitting metrics with labels: pod, container, namespace, endpoint, method, status_code, customer_id. Cardinality is ~500k series and growing. Memory usage on Prometheus is 12GB. Recording rules help but don't reduce storage. Which labels are actually worth keeping for alerting vs debugging? Looking for a systematic way to audit label usefulness.

Noma · Accepted Answer

Drop customer_id from high-cardinality metrics. It's the usual suspect. For alerting, you need pod, method, status_code. For debugging, namespace and endpoint are useful. Everything else should go into logs, not metrics. Use exemplars to link metrics to traces if you need per-request detail.

Prometheus cardinality explosion from high-dimensional metrics — how to decide what labels to keep?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback