SAHM (سهم): A Benchmark for Arabic Financial and Shari’ah-Compliant Reasoning

ACL ARR 2026 January Submission9742 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Arabic NLP, financial NLP, Islamic finance, Shari’ah reasoning, question answering, benchmark dataset, instruction tuning, evidence-grounded evaluation
Abstract: English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari’ah-compliant reasoning. SAHM contains 14,380 expert-verified instances spanning seven tasks: AAOIFI standards QA, fatwa-based QA/MCQ, accounting and business exams, financial sentiment analysis, extractive summarization, and event–cause reasoning, curated from authentic regulatory, juristic, and corporate sources. We evaluate 19 strong open and proprietary LLMs using task-specific metrics and rubric-based scoring for open-ended outputs, and find that Arabic fluency does not reliably translate to evidence-grounded financial reasoning: models are substantially stronger on recognition-style tasks than on generation and causal reasoning, with the largest gaps on event–cause reasoning. We release the benchmark, evaluation framework, and an instruction-tuned model to support future research on trustworthy Arabic financial NLP.
Paper Type: Long
Research Area: Multilinguality and Language Diversity
Research Area Keywords: benchmarking, corpus creation, language resources, evaluation methodologies, reproducibility, NLP datasets, multilingual benchmarks, financial NLP
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: Arabic (Modern Standard Arabic)
Submission Number: 9742
Loading