AI Act Article 10 data quality requirements: handling synthetic training data in high-risk biometric systems
The EU AI Act Article 10 requires training, validation, and testing datasets to meet specific quality criteria — relevance, representativeness, freedom from errors, completeness. But for high-risk biometric identification systems, what's the practical path when your real-world dataset is too small or too biased? Can synthetic data (GANs, diffusion models) fill the gap without violating Art. 10(3)'s requirement that datasets reflect the specific geographical, behavioural, and demographic settings? Has anyone actually had a notified body accept synthetic-augmented datasets, or does it automatically trigger an Art. 10(5) gap justification? Also: if you use synthetic data, does Art. 10(4)'s bias monitoring obligation extend to the generative model itself? Or just the final dataset composition?