Keywords: Causal Inference;Readability Assessment;Counterfactual Intervention;Backdoor Adjustment
Abstract: Readability assessment is a pivotal domain in education. The prevailing frameworks have limitations: indirect statistical regressions are constrained by correlational paradigms, failing to uncover the causal mechanisms between text features and readability. Meanwhile, although deep learning-based direct methods have achieved success in prediction, they lack interpretability, which hinders the dynamic optimization of features. Grounded in the Chinese context, we propose the CIRCA framework (Causal Interpretable Readability for Chinese Assessment). This framework disentangles spurious associations from genuine causal effects through mathematically principled counterfactual interventions and develops a quantification model using total variation distance. The results show that features insignificant in correlation analyses can exert substantial causal impacts on readability. The determinants vary by grade: in lower grades, topic ambiguity and lexical richness dominate; while in higher grades, semantic noise is more prominent. The correlation coefficient between readability scores computed using the correlation-based formula and the grading in Chinese Textbook Series (2022 Edition) is 0.63, which is notably lower than the correlation coefficient of 0.73 achieved using CIRCA, thus demonstrating the superiority of the proposed framework.
Primary Area: causal reasoning
Submission Number: 3654
Loading