A Novel Interpretation of Classical Readability Metrics: Revisiting the Language Model Underpinning the Flesch- Kincaid Index
Abstract: In the realm of natural language processing (NLP), the quantification of text readability remains crucial, with pivotal applications in education. While the Flesch- Kincaid GradLevel (FKGL) has been a foundational metric for English text readability, recent advancements, particularly with models like Bidirectional Encoder Representations from Transformers (BERT) , have heralded a new age of language model-based assessments. Contrary to popular belief about the FKGL's legacy nature, our research elucidates that FKGL encapsulates language model complexities. We introduce a novel interpretation that views FKGL as a linear blend of perplexities from specific unigram models. Leveraging the OneStopEnglish dataset, we enhanced FKGL by incorporating perplexity values from state-of-the-art language models for sentence boundaries. Our results highlight that integrating BERT's capabilities significantly bolsters FKGL's performance. The implications are vast, suggesting potential expansion to multi-lingual FKGL applications and providing theoretical backing for FKGL-based research in languages like Japanese.
Loading