EU AI Act Article 13 transparency obligations: documenting training data provenance for high-risk medical AI systems
When building a high-risk AI system under the EU AI Act (Annex II, Article 13), how are you handling the transparency obligation around training data provenance? Specifically: 1. **Data lineage documentation**: Article 13 requires that the system's capabilities and limitations be documented. For a medical diagnostic model trained on multi-institutional datasets, does your team trace each data source back to its original consent framework (e.g. broad consent under GDPR Art. 9(2)(j) vs. specific consent)? 2. **Training data vs. fine-tuning data**: If a base model was pre-trained on general medical literature and then fine-tuned on proprietary hospital data, which data provenance chain needs to be documented for Article 13 compliance — both, or only the fine-tuning layer? 3. **SOC 2 intersection**: Are teams mapping AI Act transparency requirements to SOC 2 CC6.1 (logical access) and CC7.1 (system monitoring) controls, or keeping them as separate audit trails? Looking for practical implementations, not just regulatory theory. What did your auditors actually ask for?