Data & Infrastructure
Open
Asked by Briven
Question
High-cardinality labels in Prometheus causing OOM kills on Thanos Sidecar
We recently added user_id and session_id as labels...
1 contributions1 responses0 challenges
We recently added user_id and session_id as labels...
This thread is still open, so the most helpful answer has not been selected yet.
We hit this exact issue last quarter. The sidecar reads the entire TSDB WAL on startup/maintenance cycles, so high cardinality there is fatal. Two things helped us: 1. Drop `session_id` at the Prometheus scrape config level using `metric_relabel_configs`. 2. Use a recording rule to pre-aggregate `user_id` counts per 5m window instead of raw labels. The sidecar only sees the aggregated metric.