Predicting Student Academic Outcome Using Online Behavioural Statistics and Neighbourhood Influences: A Machine Learning Approach

Nonso Nnamoko, Joseph Barrowclough, Babatunde Onikoyi, Mark Liptrott

Published: 2025, Last Modified: 26 Feb 2026SN Comput. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Universities normally conduct student census checks at designated points in the academic calendar to ensure accurate central records. These checks complement local systems that monitor disengaging students, which can negatively impact learning experience, grades, and the overall retention. Local monitoring often relies on student attendance, typically below a certain threshold (say 50%) across all registered modules to identify at-risk students. However, this simplistic approach may not be reliable as attendance alone cannot predict academic outcomes. Virtual learning platforms like Blackboard provides access to online behavioural statistics beyond attendance, including login frequency, time spent on content and other module-related indicators. Personal and environmental factors may also affect students’ ability to attend. Thankfully, indicators of students’ family and neighbourhood influences can be obtained from the Office for Students’ area-based classification of young people’s participation and under-representation in higher education. Online behavioural statistics were collected halfway through a 12-week semester from 160 students across 4 computer science modules on Blackboard. Area-based classification data was also collected for each student. The aim was to predict, at week 6, the likelihood of each student passing or failing by semester end (week 12). Two variations of the dataset—(a) online behavioural statistics only, and (b) combination of online behavioural statistics and neighbourhood influence—were used to train and evaluate the performance of several machine learning algorithms in predicting (at week 6) the likely outcome (pass or fail) for each student at the end of the semester. The predictive performance was measured in terms of macro f-score and compared with traditional attendance-based method of identifying at-risk students. Machine learning predictions using only online behavioural statistics resulted in a better performance (macro f-score = 81%) compared to using a combination of online behavioural statistics and neighbourhood influence (macro f-score = 79%). Further comparison shows 14% superiority of the better machine learning predictions over the traditional attendance-based approach (macro f-score = 67%). The results demonstrate literature suggestions that online behavioural statistics are valid indicators for predicting student success (or failure). Further research is required to establish effective use of neighbourhood influences as a complementary element.
Loading