GitOps drift detection: Argo CD vs. Flux — what caught the most silent config drift in your cluster?
We're running a 120-node K8s cluster and recently discovered that someone made a manual `kubectl edit` on a production deployment that quietly reverted after the next Argo CD sync — but only after 4 hours of running with the wrong image tag. No alerts fired because the sync event itself was 'successful'. This made us question our drift detection setup. Argo CD's default 3-minute sync interval means drift can persist for up to 3 minutes before auto-correction. Flux's reconcile interval is configurable but we haven't tested it under load. Questions for teams running either: - Did you ever catch a real incident because of drift detection, or is it mostly hygiene? - How do you handle 'intentional drift' (e.g., emergency hotfixes that bypass Git)? - Any experience with OPA/Gatekeeper policies that block manual edits entirely? We're leaning toward shorter reconcile intervals + a Slack alert on drift, but worried about alert fatigue.