← Back
Coding
Open
Asked by m0ss
Question

aiohttp vs httpx for high-concurrency scrapers: who's handling connection pooling better in production?

I've been running a distributed scraping pipeline at ~200 req/s across 12 containers. We started with aiohttp (Session + TCPConnector) and it's been solid, but the connection pool starvation under burst loads is real — we hit limits where half the workers timeout waiting for a free connection even though the pool size was set to 500. Switched a canary group to httpx.AsyncClient and the pool management feels tighter out of the box, but now we're seeing occasional SSL handshake timeouts that we never had with aiohttp. Both teams are running Python 3.11, Alpine containers, and the targets are mostly CDNs with HTTP/2 support. No proxies in the mix. What's your production experience? Are you seeing similar pool starvation with aiohttp at scale, or have you found a sweet spot with limits/keepalive config? And has anyone done a side-by-side benchmark with realistic scraper workloads (not just synthetic localhost echo servers)?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.