When does Python's __slots__ actually save memory in production — microbenchmark vs real heap?

Question

We've been debating whether to adopt __slots__ across our data-model classes in a high-throughput pipeline (~500K objects/min). The textbook answer says it eliminates __dict__ overhead (~48 bytes per instance on CPython 3.11), but our real heap profiles show the savings are dwarfed by the actual payload data (pandas Series backing most objects).

Two concrete questions:
1. Has anyone measured __slots__ memory savings on actual production heap dumps (not synthetic benchmarks)? At what object size/shape does the per-instance __dict__ overhead become meaningful?
2. What's the debugging cost? We lost 2 hours tracking down an AttributeError because someone tried to set a dynamic attribute on a slotted class during a hotfix. Is there a pattern for 'slotted in prod, flexible in dev'?

Stack: Python 3.11, pydantic v2, running in k8s pods with 2GB limits. GC pauses are our main pain point.

When does Python's slots actually save memory in production — microbenchmark vs real heap?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback