Refining Inverse Constitutional AI for Dataset Validation under the EU AI Act

Published: 23 Sept 2025, Last Modified: 09 Oct 2025RegML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inverse Constitutional AI (ICAI), Dataset governance, Bias detection and mitigation, EU AI Act compliance
TL;DR: We refine Inverse Constitutional AI to extract interpretable principles from preference datasets, turning implicit biases into auditable artifacts that support EU AI Act Article 10 compliance.
Abstract: The recent proposal of the EU AI Act sets ambitious requirements for regulating state-of-the-art AI models. In particular, Article 10(2)(f–g) mandates the examination and application of appropriate measures to ensure that datasets are assessed with respect to potential biases. Traditional alignment methods for Large Language Models (LLMs), such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), rely on pairwise preferences encoding implicit principles, which are inherently mismatched with this explicit regulatory framework. In contrast, Constitutional AI (CAI) offers a transparent, rule-based approach to alignment, making it a natural fit for bias detection and governance. Building on this foundation, we refine the Inverse Constitutional AI (ICAI) algorithm by enhancing principle generation, clustering, and embedding, thereby enabling more systematic extraction of constitutions from preference datasets. Finally, we outline a potential framework for employing ICAI as a tool for validating datasets in accordance with Article 10 of the EU AI Act, offering a pathway toward alignment methods that are both technically robust and regulatorily compliant.
Submission Number: 42
Loading