Keywords: Quranic Expression Extraction, Context-Aware Quran Verse Detection, Islamic Text Processing
TL;DR: A context-aware tool for extracting Quranic verses from input text using an Arabic language model and regex-based matching.
Abstract: With the increasing use of Quranic expressions in online discourse, religious content, and modern Arabic writing, there is a growing need for tools that can automatically and accurately detect references to the Holy Quran. Furthermore, large language models (LLMs) often generate hallucinated or inaccurate Quranic content, highlighting the importance of tools capable of verifying and correcting such outputs. To address these challenges, this paper presents a multi-layered tool for extracting Quranic expressions from arbitrary input text. A central challenge in this task lies in distinguishing between intentional references and incidental lexical overlap with Quranic text. The proposed tool combines an Arabic language model with rule-based techniques to achieve high precision and contextual understanding. The language model identifies expressions likely intended as Quranic references, effectively filtering out irrelevant matches. These candidate expressions are then verified using regular expression patterns to ensure textual accuracy, returning their span in the input text along with the corresponding Surah and verse number. This hybrid framework enables context-sensitive and semantically accurate extraction of Quranic references, supporting applications in digital humanities, Islamic scholarship, and the enhancement of Quranic content presentation in AI-generated text. The tool will be made publicly available.
Submission Number: 81
Loading