**Article 10**

**Data Governance and Management Practices**

The Consumer Credit Transformer system is developed using training, validation, and testing data sets specifically compiled for credit risk assessment in personal lending. Data governance procedures documented the origin and collection processes of these data, which comprise aggregated anonymized financial records, customer metadata, and transaction histories sourced from multiple European credit bureaus and financial institutions. The original data collection purposes varied but chiefly related to credit scoring and financial risk management; these purposes were recorded with each data source to clarify the context of initial data acquisition. Annotation procedures included standardized labelling of credit outcomes (e.g., default, timely repayment) confirmed by external credit reports, with data cleaning processes focused primarily on removing duplicate entries, correcting obvious data input errors, and imputation of missing financial indicators. Assumptions formulated during data preparation acknowledged that the available data sets aim to represent standard credit behaviors and outcomes but do not fully capture unobservable socio-economic factors or non-financial indicators.

**Assessment of Data Quality and Representativeness**

The combined data sets encompass approximately 2.7 million individual credit profiles collected over a six-year period (2017–2023), covering a wide geographical area across multiple EU member states. The statistical distributions of variables such as income level, credit utilization, and payment histories were analyzed for consistency with industry benchmarks. Data completeness varied per source: transaction histories were near-complete, whereas employment history fields exhibited substantial missingness (approximately 18%), with imputation applied where feasible. However, certain subpopulations, particularly intersectional groups defined by combinations of age, ethnicity, and regional location, were underrepresented in the available data. Explicitly identified sensitive attributes such as race, gender, and age were included and inspected for balance and fairness considerations at an aggregate level. Nevertheless, no systematic efforts were undertaken to identify or adjust for correlations between proxy variables—including ZIP codes, employment tenure, and employer categories—and protected groups. Consequently, proxies potentially introducing indirect bias remained unexamined.

**Bias Examination and Mitigation Measures**

Bias assessment protocols incorporated disparity analyses on explicit sensitive attributes to detect statistically significant differences in creditworthiness predictions across groups defined by gender, race, and age. These checks employed standard fairness metrics, such as disparate impact ratio and equal opportunity difference, focusing solely on these explicitly identified attributes. The assessments revealed some disparities in predictive outcomes; however, no further investigation targeted proxy features, nor intersectional strata involving multiple overlapping sensitive attributes. Mitigation strategies implemented at the provider level were limited to post-hoc calibration techniques addressing observed discrepancies related to explicitly flagged sensitive attributes only. No upstream data transformation or re-weighting procedures were applied to address latent proxy-induced biases. As such, indirect bias arising from variable correlations with protected attributes was not systematically mitigated during model development.

**Consideration of Contextual and Functional Specificities**

The data selected reflect credit behaviors across diverse European urban and rural environments. However, contextual nuances related to specific regional socio-economic dynamics or differential access to credit facilities were not fully incorporated into data stratification or model design. For example, ZIP code features were included as proxy locational identifiers within the tabular data but were not subjected to bias-specific evaluation, despite their known association with demographic and socio-economic characteristics. Functional settings, including consumer borrowing patterns evolving from post-pandemic economic conditions, were considered only through time-based splits in training datasets, without dedicated adjustments for emergent structural disparities affecting vulnerable groups.

**Use and Processing of Sensitive Data under Regulatory Provisions**

The training data leveraged in Consumer Credit Transformer refrain from processing special categories of personal data (e.g., racial or ethnic origin, political opinions) beyond explicit inclusion of non-sensitive demographic indicators. No exceptional data processing of sensitive data under strict safeguards—as described in Article 10(5)—was undertaken during model development. This choice reflected the assessment that bias detection and correction focused on explicit attributes alone could be performed without processing additional special category data, synthetic data, or anonymized proxies tailored to proxy bias identification. Consequently, the system’s data processing architecture did not implement pseudonymisation or special access controls relating to these categories beyond standard security measures compliant with GDPR.

**Summary of Data Set Limitations and Compliance Considerations**

The current development process of the Consumer Credit Transformer exhibits a comprehensive documentation of data origin, cleaning, and annotation activities, as well as representation of explicit sensitive attributes in bias analyses. Nonetheless, data governance does not include systematic evaluation of proxy variables or intersectional group fairness. These gaps present recognized shortcomings in detecting and mitigating indirect biases potentially embedded in training data that can influence creditworthiness prediction outcomes. The provider’s bias mitigation measures are correspondingly limited to explicit attribute-based adjustments, without extending to proxy or intersectional bias correction approaches. These characteristics and decisions are explicitly reflected in documentation to inform compliance assessment under Article 10.