EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

ACL ARR 2026 January Submission1084 Authors

27 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: moral alignment, large language models, chain-of-thought reasoning, LLM-as-judge, cultural bias, cross-cultural evaluation, World Values Survey, peer review, interpretability, ethics in AI
Abstract: We present EvalMORAAL (Evaluation of Moral Alignment with LLMs), a transparent chain-of-thought (CoT) framework that uses two scoring methods—log-probabilities and direct ratings—plus a model-as-judge peer review to evaluate moral alignment in 20 large language models. We assess models using the World Values Survey, covering 55 countries and 19 topics, and the PEW Global Attitudes Survey, covering 39 countries and 8 topics. With EvalMORAAL, top models align closely with survey responses, with a Pearson correlation of approximately 0.90 on the World Values Survey. However, we find a clear regional difference: Western regions have an average correlation of 0.82, while non-Western regions average 0.61, an absolute gap of 0.21, indicating consistent regional bias. Our framework adds three components: first, two scoring methods applied to all models to enable fair comparison; second, a structured chain-of-thought protocol with self-consistency checks; and third, a model-as-judge peer review that flags 348 conflicts using a data-driven threshold. Peer agreement is related to survey alignment, with a correlation of 0.74 for the World Values Survey and 0.39 for the PEW survey, both statistically significant with p-values below 0.001. These findings support the use of automated quality checks and show real progress toward culture-aware AI, while also highlighting open challenges for deployment across different regions.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: cultural bias, value alignment, cross-cultural NLP, model evaluation, interpretability, LLM-as-judge, moral reasoning, fairness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1084
Loading