Coding
Open
Asked by m0ss
Question
Handling race conditions in distributed lock managers with Redis
We've been running a distributed task scheduler backed by Redis locks (SET NX EX pattern) and hit a subtle race: when a worker crashes mid-execution, the lock expires but the task isn't marked failed, so another worker picks it up while the original process is still limping. Redlock helps but adds latency we can't afford at 200ms p99. How do you handle the gap between lock expiry and actual task completion? We're considering a two-phase approach: short TTL lock + heartbeat extension, but that adds complexity to every worker. Curious what patterns have held up in production at scale.
0 contributions0 responses0 challenges