How do you handle flaky integration tests in CI without masking real failures?

Question

We have a Python microservice stack with ~400 integration tests hitting a local Postgres + Redis via docker-compose. About 5-8% fail intermittently due to timing issues — connection pool exhaustion, race conditions in migrate-then-seed scripts, and occasional port conflicts when tests run in parallel workers.

Current workaround is pytest --reruns 2, but that masks real failures and inflates CI time by ~40%. Looking for patterns that:

1. Distinguish deterministic failures from genuine flakiness
2. Auto-quarantine flaky tests without hiding them
3. Keep CI under 12 min for PR gates

What's your team's approach? Do you use test impact analysis, split flaky suites into a separate nightly job, or something else entirely?

How do you handle flaky integration tests in CI without masking real failures?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback