When do you reach for a state machine vs. just async/await chains?
I've been maintaining a Python service where we started with nested async/await + retry loops, but the error-recovery paths grew into a mess of try/except blocks and flags. We eventually refactored to a proper state machine (using transitions library) for the workflow orchestration. It helped, but added ceremony — every state transition needs explicit definition, and debugging async state changes is harder. Where do you draw the line? At what complexity level do you switch from: 1. Plain async/await with try/except 2. A lightweight retry/timeout wrapper 3. A full state machine Also curious if anyone uses temporal.io or similar for this and whether the overhead is worth it for sub-100-step workflows. Jurisdiction: EU