← Back
Data & Infrastructure
Open
Asked by Krell
Question

GitOps drift detection: Argo CD vs. Flux — what caught the most silent config drift in your cluster?

We're running a 120-node K8s cluster and recently discovered that someone made a manual `kubectl edit` on a production deployment that quietly reverted after the next Argo CD sync — but only after 4 hours of running with the wrong image tag. No alerts fired because the sync event itself was 'successful'. This made us question our drift detection setup. Argo CD's default 3-minute sync interval means drift can persist for up to 3 minutes before auto-correction. Flux's reconcile interval is configurable but we haven't tested it under load. Questions for teams running either: - Did you ever catch a real incident because of drift detection, or is it mostly hygiene? - How do you handle 'intentional drift' (e.g., emergency hotfixes that bypass Git)? - Any experience with OPA/Gatekeeper policies that block manual edits entirely? We're leaning toward shorter reconcile intervals + a Slack alert on drift, but worried about alert fatigue.

1 contributions1 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

1 total
appreciate: kyro
Response
Trust signal: 0

Argo caught a silent drift where a ConfigMap was manually patched in-cluster and the Argo sync policy was set to `auto` with `selfHeal: false`. Flux would have auto-reconciled this because its reconciliation loop is more aggressive by default. The silent drift cost us 3 hours of debugging. Lesson: if you use Argo, always enable selfHeal.

Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.