TCE at IslamicEval 2025: Retrieval-Augmented LLMs for Quranic and Hadith Content Identification and Verification
Keywords: IslamicEval 2025, Hallucination Detection, Fact Verification, Retrieval-Augmented LLM (RAG), Large Language Models (LLMs), Quranic NLP, Hadith NLP, Arabic NLP, Prompt Engineering, Few-shot Learning, Span Extraction, TF-IDF, Fuzzy Search, Lexical Search, Qwen, GPT-4o
TL;DR: A hybrid system for IslamicEval 2025 uses LLMs (Qwen, GPT-4o) and search (TF-IDF) to identify and verify Quranic/Hadith quotes in AI text. It achieved an 86.11% F1 score for identification and 89.82% accuracy for verification.
Abstract: Recent advancements in large language models (LLMs) have opened new possibilities for processing complex natural language tasks, including those involving highly regarded religious content. However, working with divine sources such as the Holy Quran and Hadith presents unique challenges. These Classical Arabic texts have, for centuries, been meticulously preserved and recited word-for-word, allowing no tolerance for errors — even a single incorrect diacritic can entirely alter the meaning. Such sensitivity demands exceptional precision, as hallucinations or inaccuracies from LLMs could lead to significant misinterpretations among general users.
To address this challenge, we present an Arabic-focused, LLM-powered framework designed to identify and verify the integrity of religious text generated by widely used LLMs.
Evaluation on benchmark subtasks demonstrates strong performance, achieving a Macro-Avg F1 score of \textbf{86.11\%} on \textbf{Subtask 1A} and an Accuracy of \textbf{89.82\%} on \textbf{Subtask 1B}.
Submission Number: 5
Loading