**Article 10**  

**Data Governance and Management Framework**  
The development of the Legal Termination Assessment Framework adhered to a defined data governance protocol aligned with the intended purpose of assessing contract termination eligibility and risks. The primary training corpus comprised approximately 85,000 full-time employee contracts, sourced from multiple corporate human resources departments across diverse sectors within the European Union. These data sets originated from standardized contract templates and historical termination case documentation, collected under explicit consent and governed by internal HR policies for lawful processing. Data preparation involved comprehensive annotation of contract clauses, termination conditions, and associated contextual employment metadata by legal experts and trained annotators. Cleaning procedures addressed inconsistencies, redundant entries, and anonymization of personal identifiers, ensuring alignment with GDPR and related regulations. The assumptions underpinning the data represent the typical language and structure of full-time employment contracts, focusing on clauses most frequently implicated in termination decisions.  

**Assessment of Data Suitability, Representativeness, and Quality**  
A detailed evaluation was conducted to assess dataset coverage relative to the system’s termination assessment objective. While the employed corpus robustly represents full-time employment contract structures, controlled internal audits identified a notable underrepresentation of contracts relating to part-time, temporary, and precarious employment types. These categories individually constituted less than 5% of the training data, which is insufficient for statistically reliable model generalization in those contexts. Validation metrics reveal that model accuracy on contract termination predictions exceeded 92% for full-time contracts but decreased to approximately 76% for the limited sample of part-time and non-standard contracts. The training data sets are statistically sound in coverage and completeness for full-time employment contracts but lack balance across employment modalities, limiting representativeness in specific subpopulations. Error rates and missing data were below 2% for the primary contract type, with comprehensive measures taken to ensure annotation quality and data consistency within this domain.  

**Bias Identification and Mitigation Measures**  
Following an established bias assessment methodology, the provider executed subgroup performance analyses to detect biased outcomes influencing protected groups and employment categories. This analysis highlighted a measurable bias in reliability when assessing contract termination risk for part-time, temporary, and precarious employment contracts. In acknowledgment of this data gap, the provider implemented weighted loss functions during model training to partially compensate for minority class imbalance. Additionally, augmentation techniques, including synthetic data generation for underrepresented contract types, were explored; however, these approaches yielded marginal improvements and were not integrated into the final production model due to concerns about semantic fidelity and legal accuracy. No special categories of personal data were processed beyond baseline anonymized employment information, in compliance with data protection standards. Ongoing bias monitoring protocols are in place during post-market surveillance to detect performance degradation or unfair outcomes associated with underrepresented contract types.  

**Recognition and Documentation of Data Gaps**  
Comprehensive documentation explicitly identifies the limited scope of contract types within the training data and its implications for model reliability in non-standard employment contexts. The provider has flagged this limitation in the system’s technical specifications and user manuals, advising users on the potential reduced accuracy of assessments outside of standard full-time contracts. Steps to address these gaps through expanded data acquisition efforts and collaborative partnerships to collect more diverse employment contracts remain under consideration but have not been executed during the current development cycle. This transparency supports informed deployment decisions and risk management in operational contexts.  

**Technical Architecture Supporting Data Quality Assurance**  
The AI system integrates a dual-model architecture combining gradient-boosted decision trees (GBDT) trained on structured employee data and transformer-based natural language processing (NLP) models interpreting unstructured contract texts. Both models receive input processed through a unified data validation pipeline, incorporating schema enforcement, duplication removal, and semantic consistency checks. Training datasets were split into stratified folds to preserve distributional characteristics, with continuous integration pipelines automating model retraining and validation against benchmark datasets representative of full-time contracts. Version control and audit trails ensure traceability of dataset versions and preprocessing steps, facilitating reproducibility and accountability. The system’s deployment platform incorporates runtime monitoring of input data conformity and flags contracts that fall outside of the well-represented training domain for additional human review.  

**Data Protection and Security Measures**  
Data handling complies with GDPR and related EU data protection frameworks, employing encryption at rest and in transit, role-based access controls, and activity logging. No special categories of personal data were ingested or processed beyond what is strictly necessary for contract evaluation. Pseudonymisation was applied where applicable to reduce re-identification risks. Access to training and validation datasets is restricted to authorized personnel under confidentiality agreements, with security audits conducted quarterly to maintain safeguard integrity. Retention policies mandate secure deletion of datasets no later than two years beyond the last model training cycle, consistent with legal and ethical requirements.  

**Summary of Provider Decisions Regarding Data Use**  
Given the necessity of maintaining high accuracy and legal interpretability in contract termination assessments, the provider prioritized dataset quality and consistency within the full-time employment context, where the bulk of relevant contract data is available. The decision not to include or artificially inflate part-time, temporary, or precarious contract examples was driven by concerns over data quality variability and semantic complexity inherent to these less standardized employment forms. Instead, the provider opted for transparency regarding these representational limitations and incorporated system safeguards to mitigate unintended consequences. This approach reflects a balance between model performance integrity, legal compliance, and ethical responsibility in managing known data limitations.