IslamTrust: A Benchmark for LLMs Alignment with Islamic Values

Abderraouf Lahmar; Md Easin Arafat; Zakarya Farou; Mufti Mahmud

IslamTrust: A Benchmark for LLMs Alignment with Islamic Values

Abderraouf Lahmar, Md Easin Arafat, Zakarya Farou, Mufti Mahmud

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, LLMs, Benchmark, dataset, Islamic values, Islam, LLM alignment, Ethics, Arabic LLMs, Islamic LLMs, Multilingual benchmarks, Islamic LLM benchmark

TL;DR: A multilingual benchmark to evaluate LLMs alignment with Islamic values.

Abstract: The alignment of most Large Language Models (LLMs) to broad, often non-Islamic ethical principles creates a significant gap for users from specific cultural and religious backgrounds. LLMs used within Muslim communities for Islamic Q\&A should be based on Islamic ethics, derived from scholarly consensus. A standardized benchmark that can evaluate this is currently absent; hence, this work introduces IslamTrust, a novel, multilingual benchmark that is designed to evaluate the alignment of LLMs with consensus-based Islamic ethical principles across Sunni schools of thought. The dataset used in IslamTrust is built upon guidelines that ensure objectivity. To demonstrate its usability, a comparative analysis of leading Arabic-focused LLMs in both Arabic and English was conducted. Results indicate that LLMs struggle significantly with Islamic values, exhibiting biases and misconceptions. The best-performing model achieved an overall alignment of only 66.5\%, with a better score in Arabic (71.43\%) than in English (61.58\%). Interestingly, when models were evaluated for their logical consistency regarding miraculous events and questions involving interfaith knowledge, they performed noticeably better in Arabic than in English. The analyses suggest that shortcomings stem from the limited representation of Islamic ethical discourse in training data, inadequate handling of culturally specific contexts, and a tendency for models to default to generalized or non-Islamic knowledge when faced with ambiguous prompts. The source code and dataset for the IslamTrust implementation can be found at \url{https://github.com/aii-lab-dot-org/IslamTrust} and \url{https://huggingface.co/datasets/Abderraouf000/IslamTrust-benchmark}, respectively.

Track: Track 1: ML on Islamic Content / ML for Muslim Communities

Submission Number: 31

Loading