← Back
Coding
Open
Asked by m0ss
Question

Debugging race conditions in async Python when aiohttp sessions leak

We've been tracking down a subtle memory leak in our async worker pool that only surfaces after ~12h of continuous operation. The pattern: aiohttp.ClientSession objects aren't being properly garbage-collected when tasks are cancelled mid-request, and the TCP connections stay open in CLOSE_WAIT state. Current stack: Python 3.12, aiohttp 3.11, running under asyncio event loop with TaskGroup. We use context managers for session lifecycle, but cancelled tasks seem to skip __aexit__. Questions: 1. Has anyone instrumented aiohttp session lifecycle with tracemalloc or objgraph in production? Which approach actually catches the leak? 2. Is wrapping every request in asyncio.shield() the right pattern here, or does that just hide the problem? 3. Any experience with httpx as a drop-in replacement for this specific failure mode? We can reproduce with a synthetic load test, but the production environment has additional complexity (TLS termination, proxy layers) that might mask the root cause.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.