Automatic Analysis of Language Use in K-16 STEM Education and Impact on Student Performance

Farah Nadeem

Published: 01 Jan 2020, Last Modified: 02 Jul 2023undefined 2020Readers: Everyone

Abstract: There is a growing community of research focusing on educational applications of natural language processing (NLP). The applications tend to focus on analysis of student writing for scoring and feedback, and analysis of language learning. There has been less focus on analysis of language use in educational content, like assessment questions and textbooks, which is largely an expert driven process. This work examines this space, presenting automated tools for analysis of language use in K-16 science, technology, engineering and mathematics (STEM) education, and demonstrates the utility of automatically extracted features in studying student performance. This work also serves to bridge research in educational measurement and machine learning, providing a machine learning framework for analysis of factors that contribute to the difficulty of science assessment items. Within the broader umbrella of language use, this work focuses on two aspects: language difficulty (or linguistic complexity), and gender representation. Linguistic complexity has been studied from both the expert driven educational perspective and in the context of machine learning and NLP based tools. For the latter, models have shown a high agreement with expert annotation for longer documents, however, have not been shown to work well for shorter, informational texts. This work presents a discourse aware hierarchical neural model for classification of linguistic complexity quantified as grade level, demonstrated to work accurately for shorter texts, achieving state-of-the- art performance. Unlike most existing NLP based methods, the performance of our model is also validated for the downstream task of predicting student performance, where we find an impact both for K-12 and college level STEM assessments. The model for classification also generalizes to other text classification problems. Educational measurement research for prediction of difficulty of assessments questions is important in the context of assessment design and analysis of student learning. To understand the relative importance of factors impacting difficulty, many past studies have relied on use of linear models for predicting item difficulty given item characteristics. Some more recent work has looked at non- linear tree-based ensemble methods, but without analysis to identify important item characteristics. In our work with linear methods, we provide specific examples showing that the commonly used assumptions of feature independence and linear relationship between features and difficulty do not hold in practice. We also use non-linear ensemble models for the prediction problem, but unlike previous work, present a robust analysis of model performance, and apply recently introduced methods of feature interpretation to analyze aspects that contribute to question difficulty. Our results demonstrate that some item characteristics, including linguistic complexity, have a non-linear impact on item difficulty. Analysis of how gender roles are depicted in content, including assessment questions, is also a growing area of research in the educational space. This is important since negative stereotypes can impact both student performance and retention of students in STEM. Expert annotation for this task is very time consuming and can be prohibitively expensive for large text collections. Our work presents NLP based methods to automate this process for STEM textbooks and middle school assessment items. Specifically, we extract gendered mention counts, more nuanced aspects of roles, agency and authority of gendered characters, and activity characteristics. Using these features, we develop tools for analysis of content and assessments for gender biases, showing that biases exist both in terms of the frequency with which masculine and feminine characters appear in the texts, as well as in terms of the activities, roles, agency and authority of these mentions. Together, these results show the utility of NLP tools for analysis of language use in educational content, providing downstream validation with analysis of student performance. Our findings demonstrate that NLP-based analysis tools can identify sources of difficulty even in expert-curated educational content.

0 Replies