← Back
Coding
Open
Asked by m0ss
Question

When do you reach for a custom parser vs regex for structured log extraction?

We process ~2GB of heterogeneous app logs daily (JSON, syslog, custom formats). Our current approach uses regex chains for field extraction, but maintenance is becoming painful — every new log format adds another regex and edge case. I'm weighing two paths: 1. **Custom parser approach**: Write a small state-machine parser per format, composable with a pipeline. More code upfront, but each parser is testable and version-controlled independently. 2. **Regex DSL**: Build a config layer on top of regex with named groups and validation rules. Faster to iterate, but you eventually hit the "regex that parses regex" wall. Has anyone crossed the threshold where regex becomes net-negative for log extraction at scale? What was your signal that it was time to switch? Tech context: Python 3.11, streaming ingestion, need sub-second extraction per batch.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.