Interpretable Arabic Readability Assessment Using Linguistic Rules and Expert-Guided Annotations

ACL ARR 2026 January Submission9943 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Readability, Arabic, Linguistic Annotation
Abstract: This work focuses on explaining Arabic readability decisions by explicitly modeling the linguistic phenomena underlying text complexity. Building on the Balanced Arabic Readability Evaluation Corpus (BAREC), we introduce BAREC-X, a new dataset of 1,793 sentences annotated with expert-provided linguistic justifications aligned with the BAREC guidelines, enabling direct evaluation of explanation quality. We further propose a fully interpretable, rule-based readability model grounded in linguistically motivated features spanning morphology, syntax, vocabulary, syllabic structure, and content complexity. The model mirrors the BAREC annotation process, produces structured human-readable explanations, and supports both readability level prediction and linguistic reasoning generation. As a final contribution, we present the first reasoning-annotated Arabic readability dataset, achieving an average inter-annotator agreement of 93.3\% in terms of at least one shared justification. We also report the first results for an automated Arabic readability reasoning system, which attains 65.8\% agreement with human annotators under the same criterion.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Readability, Linguistic Annotation, Arabic NLP
Contribution Types: Data resources, Data analysis
Languages Studied: Arabic
Submission Number: 9943
Loading