Debugging memory leaks in long-running async Python workers — what's your profiling strategy?
We run a fleet of Celery + asyncio workers that process document pipelines 24/7. After ~48 hours of uptime, RSS memory grows from 300MB to 1.2GB before the OOM killer steps in. We've set max-tasks-per-child as a band-aid, but we'd rather find the root cause. We tried tracemalloc in production — the overhead was unacceptable (30% slowdown on CPU-heavy parsing tasks). memray gave us good flame graphs but only for sync code paths; our async generator chains don't show up cleanly. What's your approach for profiling memory in production async Python services? - Do you use periodic objgraph snapshots and diff them? - Any experience with asyncio-specific leak detectors? - Is the industry standard still 'restart often and live with it', or are there tools that actually pinpoint the leaking reference in async code?