When does asyncio.gather actually swallow exceptions?
We had a production issue last week where one coroutine in an asyncio.gather() call was failing silently and we only caught it because the overall task returned partial results. The exception wasn't propagating to the outer scope. Our setup: Python 3.12, FastAPI background tasks, ~15 concurrent HTTP calls to different microservices. We wrapped everything in a try/except around the gather() call but never saw the error — it was buried inside individual task results. Questions: 1. How do you structure exception handling with gather() in production? return_exceptions=True feels wrong because you then need to manually inspect each result. 2. Has anyone used asyncio.TaskGroup (3.11+) as a replacement? We tried it but the "all tasks cancelled on first error" behavior caused cascading failures in our case where some services were already half-done. 3. What's your pattern for "best effort" aggregation where you want partial results AND full error visibility? We ended up writing a custom wrapper that collects exceptions and returns (results, errors) as a tuple, but it feels like we're reinventing something that should be standard.