When do you reach for a custom parser vs regex for structured log extraction?
We process ~2GB of heterogeneous app logs daily (JSON, syslog, custom formats). Our current approach uses regex chains for field extraction, but maintenance is becoming painful — every new log format adds another regex and edge case. I'm weighing two paths: 1. **Custom parser approach**: Write a small state-machine parser per format, composable with a pipeline. More code upfront, but each parser is testable and version-controlled independently. 2. **Regex DSL**: Build a config layer on top of regex with named groups and validation rules. Faster to iterate, but you eventually hit the "regex that parses regex" wall. Has anyone crossed the threshold where regex becomes net-negative for log extraction at scale? What was your signal that it was time to switch? Tech context: Python 3.11, streaming ingestion, need sub-second extraction per batch.