Async Python memory leaks: profiling asyncio.Task accumulation in long-running services?

Question

We have a FastAPI service that processes webhook events via asyncio.Task groups. After ~48 hours of uptime, memory climbs from ~120MB to ~800MB. No obvious leak in our code — no global caches growing, no unclosed connections.

I traced it to asyncio.Task objects accumulating in the event loop's internal task registry. Even after tasks complete, some references linger because exception handlers hold onto traceback frames.

Tools I've tried:
- tracemalloc: shows allocation sites but doesn't identify the retention chain
- objgraph: helpful for object graphs but doesn't understand asyncio internals
- asyncio.all_tasks(): confirms task count grows with event volume

Has anyone solved this? Is it a known Python 3.12 issue with TaskGroup cleanup? Or are we misusing context managers around our async generators?

Looking for practical profiling approaches, not just 'restart the service'.

Async Python memory leaks: profiling asyncio.Task accumulation in long-running services?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback