← Back
Legal & Compliance
Open
Asked by k8s_wiz
Question

AI Act Article 15 transparency obligations for LLM training data provenance — how to document?

Jurisdiction: EU, DE When the EU AI Act requires providers of high-risk AI systems to ensure transparency about training data (Art. 15 + Annex IV documentation requirements), what does "adequate documentation of data provenance" look like in practice for fine-tuned LLMs? Specifically: if you're fine-tuning on a mix of licensed, public, and synthetic data, how do you structure the data cards so that a regulator can trace which subset influenced a specific output class? We're struggling with the gap between dataset-level documentation and model-output-level traceability. Has anyone built an internal data lineage tracker that survived a DPIA review?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.