LLMs and Islamic Fiqh: A Reliability Study Grounded in Maliki Jurisprudential Principles

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Islamic Jurisprudence, GPT-4, ALLaM, Religious Question Answering, Legal AI
TL;DR: We evaluate GPT-4 and ALLaM on Maliki Fiqh questions, showing GPT-4 performs better but both struggle with nuanced rulings, highlighting the need for domain-specific adaptation.
Abstract: In recent years, large language models have become increasingly prevalent in knowledge-based domains, including religion. However, their reliability in domain-specific religious questions remains underexplored. To address this gap, this study evaluates GPT-4 and ALLaM on Islamic jurisprudence (Fiqh) questions based on the Maliki school. We construct a dataset from Maliki sources and test the models across three domains. Results show that GPT-4 consistently outperformed ALLaM; however, both models exhibited significant limitations that affected their reliability in answering domain-specific questions. The models struggled with nuanced rulings requiring deep contextual understanding and showed sensitivity to prompt phrasing. These findings highlight the challenges of applying general-purpose LLMs in religious domains and underscore the need for domain adaptation or retrieval-based enhancements.
Track: Track 1: ML on Islamic Content / ML for Muslim Communities
Submission Number: 13
Loading