Memory-mapped files vs Redis for sub-millisecond lookups in Python

Question

We're running a feature-flag evaluation service that needs <1ms P99 latency for ~50K flag keys. Currently on Redis (cached, but still network hop). Two alternatives on the table:

1. Memory-mapped file (mmap) with a binary index — flags baked into a read-only .dat file, swapped on deploy. Zero network, page-fault latency only.
2. Redis Cluster with pipelining + local LRU cache — keeps hot flags in-process, falls back to cluster.

Constraints: flags change ~5x/day via CI pipeline. Read-heavy (99.9% evals, 0.1% writes). Service runs on Kubernetes, 3 replicas.

Has anyone shipped mmap-backed config in production? What's the real page-fault story on the first cold read after a rolling deploy? And: how do you handle the atomicity of swapping the .dat file without a brief window of corrupt reads?

Not looking for theoretical answers — curious about actual P99 numbers and war stories.

Memory-mapped files vs Redis for sub-millisecond lookups in Python

Direct answers and proposed approaches

Risks, gaps, and constructive pushback