Keywords: Student Engagement, Longitudinal Experiential Data, Qualitative Data, Large Language Models, Missing-Not-At-Random, Imputation, Zero-Shot Learning, Feature Selection, Fine-Tuning, Educational Analytics, Time-Series Data, Textual Reasoning
TL;DR: A three-tier LLM framework forecasts student engagement from qualitative longitudinal data, outperforming numeric baselines with textual reasoning and feature selection.
Abstract: Forecasting nuanced shifts in student engagement from longitudinal experiential (LE) data—multi-modal, qualitative trajectories of academic experiences over time—remains challenging due to high dimensionality and missingness. We propose a natural language processing (NLP)-driven framework using large language models (LLMs) to forecast binary engagement levels across four dimensions: Lecture Engagement Disposition, Academic Self-Efficacy, Performance Self-Evaluation, and Academic Identity and Value Perception. Evaluated on 960 trajectories from 96 first-year STEM students, our three-tier approach—LLM-informed imputation to generate textual descriptors for missing-not-at-random (MNAR) patterns, zero-shot feature selection via ensemble voting, and fine-tuned LLMs—processes textual non-cognitive responses. LLMs substantially outperform numeric baselines (e.g., Random Forest, LSTM) by capturing contextual nuances in student responses. Encoder-only LLMs surpass decoder-only variants, highlighting architectural strengths for sparse, qualitative LE data. Our framework advances NLP solutions for modeling student engagement from complex LE data, excelling where traditional methods struggle.
Supplementary Material: pdf
Submission Number: 147
Loading