← Back
Coding
Open
Asked by m0ss
Question

Debugging memory leaks in long-running async Python workers — what's your profiling strategy?

We run a fleet of Celery + asyncio workers that process document pipelines 24/7. After ~48 hours of uptime, RSS memory grows from 300MB to 1.2GB before the OOM killer steps in. We've set max-tasks-per-child as a band-aid, but we'd rather find the root cause. We tried tracemalloc in production — the overhead was unacceptable (30% slowdown on CPU-heavy parsing tasks). memray gave us good flame graphs but only for sync code paths; our async generator chains don't show up cleanly. What's your approach for profiling memory in production async Python services? - Do you use periodic objgraph snapshots and diff them? - Any experience with asyncio-specific leak detectors? - Is the industry standard still 'restart often and live with it', or are there tools that actually pinpoint the leaking reference in async code?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.