Machine Learning Models for Predicting Suicidal Ideation in University Students

Fernanda Rubio Valenzuela

Machine Learning Models for Predicting Suicidal Ideation in University Students

Fernanda Rubio Valenzuela

Published: 22 Sept 2025, Last Modified: 27 Nov 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Learning, Suicidal ideation, Feature selection, Classification

Abstract: Suicidal ideation is one of the most pressing yet preventable manifestations of psychological distress. While suicidal behavior includes ideation, planning, and attempts, suicidal ideation is particularly critical because it represents an earlier and potentially reversible stage. Focusing on ideation allows for timely detection and preventive action, making it a key target for intervention. University students constitute a population of special concern. The transition to higher education is often accompanied by academic overload, social adjustment, and reduced access to traditional support networks, all of which can compromise psychological well-being. In this context, symptoms such as depression, anxiety, and suicidal ideation are not only prevalent but also frequently normalized as part of student life. This normalization risks obscuring the severity of underlying mental health concerns, allowing deterioration to progress unchecked. Understanding and predicting suicidal ideation among university students is therefore essential to develop preventive strategies that can address risk before escalation to more severe outcomes. This study contributes to this effort by building a novel dataset of 591 Chilean undergraduate students across 28 universities and 71 academic programs. Unlike studies relying on secondary or institutional records, this dataset was designed and collected from scratch, using validated psychological instruments such as the DASS-21, Beck’s Hopelessness Scale, and the Okasha Suicidality Scale. The selection of variables was guided by a review of the scientific literature and validated by consultation with academic psychologists to ensure both theoretical and clinical relevance. To analyze these data, five supervised machine learning models were implemented: Logistic Regression, Random Forest, Support Vector Machines (SVM), Neural Networks, and XGBoost. Beyond comparing traditional and advanced models, the study emphasizes technical strategies often underutilized in psychological research, including feature engineering, class balancing, cross-validation, and recursive feature elimination (RFECV). These tools not only improve predictive performance but also expand interpretability by highlighting both established risk factors, such as depression and hopelessness, and less commonly studied variables, including neurodivergence and gender. Logistic Regression illustrates the value of these techniques: using raw numerical data, recall for suicidal ideation improved from 0.82 to 0.88 and accuracy from 0.82 to 0.85 after incorporating class balancing and optimization. Categorical data, more common in clinical contexts, increased recall further to 0.92 but reduced accuracy to 0.65, showing the trade-offs when only diagnostic categories rather than raw test scores are available. This experimental comparison underscores how data representation shapes predictive outcomes and helps bridge clinical practice with computational modeling. By connecting psychological theory with predictive modeling, this research demonstrates the potential of machine learning to enhance early detection of suicidal ideation in university populations. The findings offer insights for advancing computational approaches to mental health and for guiding interventions in a population where prevention is both urgent and feasible. This work has also been submitted to the 15th IEEE International Conference on Pattern Recognition Systems (ICPRS 2025), reflecting its alignment with current discussions in pattern recognition and applied machine learning.

Submission Number: 52

Loading