From Emotion Recognition to Mind-Wandering Detection: A Comparative Analysis of Video-Based Emotion Foundation Models
Keywords: mind-wandering detection, educational video analysis, cognitive state inference, affective computing, facial expression recognition, frozen encoders, foundation models, multimodal learning, emotion recognition, attention aware learning systems.
TL;DR: We test whether recent emotion-recognition foundation-model features transfer better to video-based mind-wandering detection than previous approaches; Emotion-LLaMA-based representations produce more ambiguous, less aligned predictions.
Abstract: Automated mind-wandering (MW) detection from educational video offers a potential path toward continuous
and non-intrusive measurement of attentional state during learning. Recent work introduced a pragmatic starting point for video-based MW detection by transferring facial emotion recognition (ER) features to an in-lab reading dataset with MW labels, showing that an AffectNetpretrained ResNet50 encoder can support above-chance
prediction. In this work, we revisit this approach in light of
recent ER foundation models by evaluating four frozen feature extractors—the AffectNet-pretrained ResNet50 baseline, MAE, VideoMAE, and the full Emotion-LLaMA
representations—within the same downstream MW classification task. Across experiments, the AffectNet-pretrained
baseline remains the strongest overall encoder, while none
of the newer Emotion-LLaMA-based representations improves MW prediction despite greater architectural sophistication. To understand this gap, we analyze per-encoder
error profiles, prediction-score separability, shared versus
encoder-specific failures, hard versus easy subsets, and
Emotion-LLaMA’s predicted emotion labels. Results indicate that Emotion-LLaMA– a state-of-the-art foundation
model across several ER benchmarks–produces more ambiguous MW decision scores, over-predicts MW more frequently and differs only weakly across MW-relevant error
cases – that stronger emotion recognition models do not
necessarily provide useful features for mind-wandering detection. Our findings showcase limitations of “emotion to
mind wandering” transfer, highlighting the need for development of encoders that capture learning-specific signals.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Track: Proceeding Track
Submission Number: 26
Loading