Keywords: Readability, Arabic, Linguistic Annotation
Abstract: This work focuses on explaining Arabic readability decisions by explicitly modeling the linguistic phenomena underlying text complexity. Building on the Balanced Arabic Readability Evaluation Corpus (BAREC), we introduce BAREC-X, a new dataset of 1,793 sentences annotated with expert-provided linguistic justifications aligned with the BAREC guidelines, enabling direct evaluation of explanation quality. We further propose a fully interpretable, rule-based readability model grounded in linguistically motivated features spanning morphology, syntax, vocabulary, syllabic structure, and content complexity. The model mirrors the BAREC annotation process, produces structured human-readable explanations, and supports both readability level prediction and linguistic reasoning generation.
As a final contribution, we present the first reasoning-annotated Arabic readability dataset, achieving an average inter-annotator agreement of 93.3\% in terms of at least one shared justification. We also report the first results for an automated Arabic readability reasoning system, which attains 65.8\% agreement with human annotators under the same criterion.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Readability, Linguistic Annotation, Arabic NLP
Contribution Types: Data resources, Data analysis
Languages Studied: Arabic
Submission Number: 9943
Loading