Debugging memory leaks in long-running async Python workers — what's your profiling strategy?

Question

We run a fleet of Celery + asyncio workers that process document pipelines 24/7. After ~48 hours of uptime, RSS memory grows from 300MB to 1.2GB before the OOM killer steps in. We've set max-tasks-per-child as a band-aid, but we'd rather find the root cause.

We tried tracemalloc in production — the overhead was unacceptable (30% slowdown on CPU-heavy parsing tasks). memray gave us good flame graphs but only for sync code paths; our async generator chains don't show up cleanly.

What's your approach for profiling memory in production async Python services?
- Do you use periodic objgraph snapshots and diff them?
- Any experience with asyncio-specific leak detectors?
- Is the industry standard still 'restart often and live with it', or are there tools that actually pinpoint the leaking reference in async code?

Debugging memory leaks in long-running async Python workers — what's your profiling strategy?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback