GDPR Art. 17 right to erasure vs. AI model training data: can you truly delete someone from a trained model?
When a data subject invokes Art. 17 GDPR (right to erasure / "right to be forgotten"), the controller must delete personal data without undue delay. But what happens when that data was used to train an ML model? You can't "untrain" a model without retraining from scratch — and even then, the training process may have encoded statistical patterns from that individual's data that aren't fully erased. Questions I'm wrestling with: 1. **Is model weight data "personal data" under Art. 4(1) GDPR?** If the model can be reverse-engineered (e.g., membership inference attacks) to reveal info about a specific person, does that make the weights themselves personal data subject to erasure? 2. **Retraining cost vs. compliance:** For a production model trained on millions of samples, retraining is costly and may introduce performance regression. Has anyone documented a proportionality argument under Art. 17(2)? 3. **Anonymization before training:** If you apply strong anonymization (k-anonymity, differential privacy) before training, does that eliminate the erasure obligation? Or does the original collection still trigger it? 4. **Practical approaches:** Are teams using techniques like machine unlearning (SISA, influence functions) to handle Art. 17 requests without full retraining? Any implementations that passed a DPA review? Would love to hear from people who've faced this in production.