Abstract: Online learning is a rapidly growing industry. However,
a major doubt about online learning is whether students are
as engaged as they are in face-to-face classes. An engagement recognition system can notify the instructors about the
student’s condition and improve the learning experience.
Current challenges in engagement detection involve poor
label quality, extreme data imbalance, and intra-class variety – the variety of behaviors at a certain engagement
level. To address these problems, we present the CMOSE
dataset, which contains a large number of data from different engagement levels and high-quality labels annotated
according to psychological advice. We also propose a training mechanism MocoRank to handle the intra-class variety and the ordinal pattern of different degrees of engagement classes. MocoRank outperforms prior engagement
detection frameworks, achieving a 1.32% increase in overall accuracy and 5.05% improvement in average accuracy.
Further, we demonstrate the effectiveness of multi-modality
in engagement detection by combining video features with
speech and audio features. The data transferability experiments also state that the proposed CMOSE dataset provides
superior label quality and behavior diversity.
Loading