Do Language Models Understand Discrimination? Testing Alignment with Human Legal Reasoning under the ECHR

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: legal ai, legal alignment, human feedback
Abstract: We investigate the extent to which large language models (LLMs) are aligned with established legal norms by evaluating their ability to reason about discrimination under the European Convention on Human Rights (ECHR). Although existing work on bias in AI focuses primarily on statistical disparities, our study shifts the emphasis to normative reasoning: testing whether LLMs can interpret, apply and justify legal decisions in line with formal legal standards. We introduce a structured framework grounded in ECHR case law, formalising the legal concept of discrimination into testable scenarios. Our empirical findings reveal that current LLMs frequently fail to replicate key aspects of legal reasoning, such as identifying protected characteristics, applying proportionality, or articulating justifications consistent with judicial logic. These results expose critical gaps in the legal alignment of today’s models and point to the need for domain-specific feedback and normative alignment methods to build trustworthy and fair AI systems for high-stakes applications.
Submission Number: 45
Loading