The EU AI Act is the most significant piece of AI regulation in history. For financial institutions using AI in credit scoring, anti-money laundering, or fraud detection, one provision demands immediate attention: Article 10, which sets binding requirements for training, validation, and testing datasets.
The enforcement deadline for high-risk AI systems is August 2, 2026. That is not a distant horizon. It is an operational deadline that requires concrete changes to how you source, document, and govern the data that feeds your AI models.
This guide breaks down what Article 10 requires, who is affected, and how to achieve compliance before the deadline.
What Article 10 Requires
Article 10 of the EU AI Act establishes mandatory data governance practices for any training, validation, and testing datasets used in high-risk AI systems. The requirements are specific and enforceable.
Data Governance and Management Practices
Under Article 10(2), providers of high-risk AI systems must implement data governance that addresses:
- Design choices for data collection and origin
- Data collection processes and their documentation
- Relevant data preparation operations (annotation, labeling, cleaning, enrichment)
- Assessment of data availability, quantity, and suitability
- Examination for possible biases likely to affect health, safety, or fundamental rights
- Identification of data gaps or shortcomings and how they are addressed
Quality Criteria for Datasets
Article 10(3) mandates that training, validation, and testing datasets must be:
- Relevant, sufficiently representative, and as free of errors as possible
- Appropriate in view of the intended purpose of the AI system
- Representative of the geographical, contextual, behavioral, or functional setting in which the system operates
Documentation Requirements
Article 10(5) requires that providers document:
- The characteristics and composition of datasets
- How data was obtained and selected
- What preprocessing and labeling methods were applied
- Any assumptions made about the data
This is not guidance. These are binding obligations with penalties up to 3% of global annual turnover or EUR 15 million, whichever is higher.
Timeline: What Happened and What Is Coming
Understanding the enforcement timeline is critical for planning.
| Date | Milestone |
|---|---|
| August 1, 2024 | EU AI Act entered into force |
| February 2, 2025 | Prohibitions on unacceptable-risk AI systems took effect |
| August 2, 2025 | General-purpose AI model obligations apply |
| August 2, 2026 | High-risk AI system obligations become enforceable |
| August 2, 2027 | Certain high-risk AI systems in Annex I get additional time |
The August 2, 2026 deadline is the critical date for financial institutions. After this date, deploying a high-risk AI system with non-compliant training data exposes your organization to enforcement action.
Who Is Affected
Financial institutions are disproportionately affected because many financial AI use cases are classified as high-risk under Annex III of the EU AI Act.
High-Risk AI Systems in Finance (Annex III, Point 5)
The following AI applications in financial services are explicitly classified as high-risk:
- Credit scoring and creditworthiness assessment of natural persons
- Risk assessment and pricing for life and health insurance
- AI systems used to evaluate credit scores or establish credit ratings
Additional Financial AI Use Cases Likely Covered
- Anti-money laundering (AML) screening involving profiling of natural persons
- Fraud detection systems that make decisions affecting individuals
- Customer due diligence automation involving risk classification of persons
- Algorithmic trading systems that interact with market infrastructure
If your institution uses AI for any of these purposes and operates in or serves the EU market, Article 10 compliance is mandatory.
The Training Data Problem
Most financial institutions face a fundamental tension: they need realistic, representative training data, but the data they have access to creates governance burdens.
Real Data Creates Governance Overhead
Using real customer data for AI training triggers the full weight of GDPR obligations:
- Legal basis required under GDPR Article 6 (legitimate interest or consent)
- Data Protection Impact Assessment (DPIA) required under GDPR Article 35
- Purpose limitation under GDPR Article 5(1)(b) may restrict reuse of production data
- Data subject rights must be accommodated (access, erasure, objection)
- Cross-border transfer restrictions under GDPR Chapter V apply
Each of these creates documentation overhead, legal review cycles, and operational risk.
Anonymized Data Still Carries Risk
Many institutions turn to data anonymization as a compromise. But anonymized data has its own Article 10 problems:
- Re-identification risk means it may still be personal data under GDPR Recital 26
- The anonymization process itself involves processing personal data, requiring GDPR compliance
- Documentation burden must cover both the source data and the anonymization method
- Bias from source data passes through anonymization unchanged
- Data gaps cannot be filled because anonymization only transforms what already exists
The Audit Trail Gap
Article 10 demands comprehensive documentation of data provenance. For real or anonymized data, this means tracing every record back through collection, consent, transformation, and quality checks. For large-scale training datasets, this documentation burden is substantial and ongoing.
How Born Synthetic Data Simplifies Article 10 Compliance
Born Synthetic data is generated entirely from mathematical models and statistical distributions. No real-world personal data is used as input, at any stage, for any purpose. This architectural choice has direct implications for Article 10 compliance.
Zero Processing of Personal Data
Because Born Synthetic data is generated from Pareto distributions, algebraic constraints, and cultural archetype models rather than real data, there is no personal data processing at any stage. This means:
- No GDPR legal basis required for the generation process
- No DPIA required for dataset creation
- No data subject rights to manage
- No cross-border transfer restrictions on the source data (there is none)
Documented Origin by Design
Every Born Synthetic dataset ships with a Certificate of Sovereign Origin that documents:
- The mathematical models used for generation
- The statistical distributions and parameters applied
- The enrichment pipeline (Math First, AI Enrichment, or FORGE Mode)
- The version of the generation pipeline
- The absence of any real data inputs
This directly satisfies Article 10(5) documentation requirements.
Bias Examination Is Built Into the Process
Article 10(2)(f) requires examination for biases. With Born Synthetic data:
- Statistical distributions are explicit and auditable
- Archetype selection across 31 profiles and 6 geographic niches is a documented design choice
- No hidden biases from historical data can leak into the dataset
- Bias characteristics can be adjusted by modifying generation parameters
Scalability Without Governance Scaling
Need 100,000 records instead of 10,000? With real or anonymized data, scaling the dataset means scaling the governance. With Born Synthetic, scaling is a parameter change in the generation pipeline. The governance documentation remains the same.
Comparison: Real Data vs. Anonymized vs. Born Synthetic for Article 10
| Criteria | Real Data | Anonymized Data | Born Synthetic |
|---|---|---|---|
| GDPR legal basis required | Yes | Yes (for source data) | No |
| DPIA required | Yes | Yes (for anonymization process) | No |
| Re-identification risk | High | Medium (never zero) | Zero |
| Article 10 documentation burden | Very high | High | Low (Certificate of Origin) |
| Bias from historical data | Present | Present (passes through) | Controllable by design |
| Scalability | Limited by data access | Limited by source data | Unlimited |
| Cross-border transfer complexity | High | Medium | None |
| Data subject rights management | Required | May be required | Not applicable |
| Cold-start capability | No (requires existing data) | No (requires existing data) | Yes |
| Time to compliant dataset | Months | Weeks | Hours |
| Ongoing governance cost | High | Medium | Minimal |
Checklist: Is Your Training Data Article 10 Ready?
Use this checklist to assess your current position. Each item corresponds to a specific Article 10 requirement.
- [ ] Data governance framework documented — Article 10(2): You have written policies covering data collection, preparation, and management for each high-risk AI system
- [ ] Data provenance recorded for every training record — Article 10(5): You can trace each record to its origin, including collection method and any transformations
- [ ] Legal basis established for data processing — GDPR Article 6 + Article 10(2)(a): If using real data, you have documented the legal basis for processing it as training data
- [ ] DPIA completed for training data pipeline — GDPR Article 35 + Article 10(2): If processing personal data, your DPIA covers the AI training use case
- [ ] Bias examination performed and documented — Article 10(2)(f): You have assessed datasets for biases affecting fundamental rights, with findings documented
- [ ] Data representativeness validated — Article 10(3): You have verified that datasets are representative of the deployment context (geography, demographics, behavior)
- [ ] Data gaps identified and mitigated — Article 10(2)(g): You have documented known gaps in coverage and your mitigation strategy
- [ ] Validation and testing datasets separated — Article 10(4): Training, validation, and testing datasets are distinct and governed independently
- [ ] Documentation audit-ready — Article 10(5): All documentation is compiled, version-controlled, and accessible for regulatory inspection
- [ ] Ongoing monitoring plan in place — Article 10(6): You have a process for monitoring data quality and relevance over the system lifecycle
If fewer than 7 items are checked, your training data governance needs significant work before August 2026.
What to Do Now: Practical Steps Before August 2026
Immediate (Q1-Q2 2026)
- Inventory your high-risk AI systems against Annex III. Identify every system that uses training data in a financial decision-making context.
- Audit existing training data provenance. Can you document the origin of every record? If not, flag the gap.
- Assess your current data governance framework against Article 10(2) requirements. Identify missing elements.
Medium-Term (Q2-Q3 2026)
- Evaluate synthetic data for compliance-critical use cases. Start with AI systems where real data governance is most expensive or risky.
- Build or procure compliant datasets for validation and testing. These are often lower-risk starting points for synthetic data adoption.
- Document bias examinations for all training datasets. This is a common gap in existing governance frameworks.
Pre-Deadline (Q3 2026)
- Compile audit-ready documentation packages for each high-risk AI system.
- Conduct internal readiness review against the full Article 10 checklist.
- Engage legal counsel to validate your documentation meets Member State enforcement expectations.
Assess Your Current Risk Exposure
Not sure where your organization stands? The GDPR Risk Assessment tool provides a free, instant evaluation of your training data regulatory exposure, including Article 10 readiness indicators.
You can also download a free sample dataset of 100 Born Synthetic UHNWI profiles to evaluate data quality and documentation standards before making a procurement decision.
Frequently Asked Questions
Does the EU AI Act apply to AI systems developed outside the EU?
Yes. Article 2 establishes that the AI Act applies to providers placing AI systems on the EU market or putting them into service in the EU, regardless of where the provider is established. If your AI system affects persons in the EU, the Act likely applies.
Are all financial AI systems classified as high-risk?
No, but many are. Annex III, Point 5(b) specifically covers AI systems used for creditworthiness assessment and credit scoring. AI used for purely internal analytics without decisions affecting individuals may fall outside high-risk classification, but this requires careful legal analysis.
Can I use synthetic data to fully replace real data for AI training?
This depends on the use case. For validation, testing, and supplemental training, Born Synthetic data can replace real data entirely. For primary model training in production credit scoring, a hybrid approach combining synthetic data for development and carefully governed real data for final calibration is common.
What is the penalty for non-compliance with Article 10?
Under Article 99, non-compliance with Article 10 obligations can result in fines of up to EUR 15 million or 3% of total worldwide annual turnover, whichever is higher. For large financial institutions, the turnover-based calculation will typically apply.
How does Born Synthetic data differ from other synthetic data approaches?
Most synthetic data generators learn patterns from real datasets, meaning the generation process involves processing personal data. Born Synthetic data is generated from mathematical models (Pareto distributions, algebraic constraints, cultural archetypes) without any real data input. This distinction is legally significant: Born Synthetic generation does not constitute personal data processing under GDPR.
Last updated: March 2026
Learn more about EU AI Act training data and how Born Synthetic data addresses this in our glossary and comparison guides.
