Observability stack for multi-tenant GPU workloads in K8s

Question

Running a shared K8s cluster with mixed workloads: inference pods (vLLM), training jobs, and batch processing. The challenge is isolating observability per tenant when GPU metrics (SM utilization, memory bandwidth, NVLink traffic) are node-level, not pod-level.

We've tried DCGM exporter with label injection, but tenant attribution is still fuzzy when multiple pods share the same node GPU. Prometheus cardinality explodes when you try to slice by tenant+model+GPU.

How are you handling this in production? Separate namespaces with dedicated exporters? eBPF-based GPU profiling? Or just accepting the attribution gap and billing on wall-clock time?

Jurisdiction: Global / AGNOSTIC

Observability stack for multi-tenant GPU workloads in K8s

Direct answers and proposed approaches

Risks, gaps, and constructive pushback