**Article 10**

**Data Governance and Management Practices**  
Election Sentiment Transformer (EST) was developed under a comprehensive data governance framework tailored specifically to its intended purpose of analyzing and influencing public electoral sentiment derived from social media text streams. The design decisions prioritized the use of publicly available social media data, sourced through platform APIs with clear documentation on data origin and collection methods. Data collection processes included continuous ingestion of real-time posts primarily in English and three major EU official languages, harvested from platforms with explicit user consent policies. The original purpose of data collection was publicly articulated by these platforms as enabling content analysis and trend detection, aligning with the system’s use case. Data preparation involved multi-stage processing: initially filtering noise and spam via heuristic rules, followed by manual and automated annotation to classify sentiment polarity and contextual relevance. An annotation team of 25 linguistic experts labeled a dataset of approximately 10 million posts, ensuring semantic consistency and cultural contextualization. Data cleaning incorporated removal of duplicates and resolved language ambiguities through contextual embedding disambiguation. Assumptions were explicitly documented, focusing on the premise that social media expressions reflect aggregate public sentiment trends rather than individual voter profiles, thereby framing the data as a proxy for societal opinion dynamics. Rigorous audits evaluated dataset availability, confirming the sufficiency of approximately 200 million unlabeled text posts over a 12-month period, complemented by 10 million manually validated samples underpinning training, validation, and testing splits. 

Bias assessments addressed risks such as overrepresentation of vocal minority groups, potential amplification of extremist views, and unintended demographic exclusions. For example, an analysis of sentiment distribution identified underrepresentation of rural socioeconomic indicators by approximately 18%, which could negatively impact fairness in electoral discourse. To mitigate these biases, data augmentation strategies incorporated targeted collection of underrepresented geographic and demographic segments, supplemented by synthetic oversampling techniques following best current practice. Ongoing bias detection algorithms evaluated drift in representativeness and sentiment polarity during deployment phases, triggering retraining when corrective thresholds were exceeded. Data gaps, notably limited representation of non-textual electoral discourse (e.g., images, videos), were documented as out of scope for the current system version but subject to roadmap review for future modality integration.

**Relevance, Representativeness, and Data Quality**  
The training, validation, and testing datasets exhibit high relevance and representativeness with respect to the system’s purpose of trend detection and sentiment analysis within democratic electoral contexts. Dataset composition reflects a balanced temporal stratification to capture evolving opinion trends before and during electoral campaigns. Statistical analyses confirmed that the datasets maintain completeness over key variables, such as post timestamps, user metadata anonymized for privacy, and linguistic markers, with error rates below 0.5% as determined by cross-validation audits. Stratified sampling techniques ensured adequate coverage of demographic groups aligned with EU population distributions, facilitating statistically robust generalization. Data representativeness was further validated through benchmarking against independent polling data and traditional sentiment indexes, achieving Pearson correlation coefficients above 0.85 in retrospective temporal alignment studies. These measures substantiate the datasets’ fitness for EST’s intended use, avoiding overfitting or spurious correlations induced by skewed data distributions.

**Contextual and Geographical Considerations**  
Datasets were curated to incorporate the contextual specifics of the EU electoral environment, including geographical, social, and functional dimensions relevant to public discourse. Data stratification included country-level metadata when available, enabling the system to distinguish between national and regional sentiment trends and account for linguistic dialects and cultural variations. Functional contextualization included tagging posts by topic clusters — such as policy debate, candidate evaluation, or electoral event commentary — identified through natural language processing pipelines employing topic modeling and entity recognition. This granularity supports the system’s capacity to adapt generation of strategic narratives to distinct electoral phases and jurisdictional particularities. Data localization controls filtered out data originating outside the EU or unauthorized regions to adhere to regulatory constraints and to ensure that the analyzed sentiments directly corresponded to the targeted electorates.

**Use of Special Categories of Personal Data and Safeguards**  
EST does not incorporate any special categories of personal data in its training, validation, or testing phases, thus circumventing the need for exceptional processing under Article 10(5). All datasets are derived exclusively from publicly available or anonymized data streams, with no individual-level profiling or sensitive personal attributes (e.g., health, ethnicity, political opinions in a personal sense) processed. The data pipeline incorporates strong pseudonymization protocols where user identifiers are replaced with non-reversible hashes to prevent re-identification, in line with state-of-the-art privacy-preserving methodologies. Access to datasets is restricted via role-based access controls and encrypted storage layers, with detailed logs maintained to document data handling. No data sharing occurs outside Horizon Analytics Group, ensuring that all governance controls around data confidentiality and integrity are strictly enforced. Retention and deletion policies mandate automatic removal of any temporary data copies after model updates, adhering to minimal necessary usage principles.

**Applicability of Data Quality Provisions for Non-Training Data**  
As EST exclusively utilizes AI models developed through machine learning techniques involving substantial training on labeled and unlabeled datasets, the quality management obligations described in paragraphs 2 through 5 of Article 10 apply comprehensively to training, validation, and testing datasets. There is no component of the system relying solely on fixed-rule or heuristic-based models exempt from these provisions. Testing datasets maintained for ongoing performance evaluation and monitoring are updated bi-annually to reflect shifting social media language and discourse patterns, subjected to the same governance standards described above.