Keywords: Machine Learning, Suicidal ideation, Feature selection, Classification
Abstract: Suicidal ideation is one of the most pressing yet preventable manifestations of psychological distress. While suicidal
behavior includes ideation, planning, and attempts, suicidal ideation is particularly critical because it represents
an earlier and potentially reversible stage. Focusing on ideation allows for timely detection and preventive action,
making it a key target for intervention.
University students constitute a population of special concern. The transition to higher education is often accompanied
by academic overload, social adjustment, and reduced access to traditional support networks, all of which can
compromise psychological well-being. In this context, symptoms such as depression, anxiety, and suicidal ideation
are not only prevalent but also frequently normalized as part of student life. This normalization risks obscuring the
severity of underlying mental health concerns, allowing deterioration to progress unchecked. Understanding and
predicting suicidal ideation among university students is therefore essential to develop preventive strategies that can
address risk before escalation to more severe outcomes.
This study contributes to this effort by building a novel dataset of 591 Chilean undergraduate students across 28
universities and 71 academic programs. Unlike studies relying on secondary or institutional records, this dataset
was designed and collected from scratch, using validated psychological instruments such as the DASS-21, Beck’s
Hopelessness Scale, and the Okasha Suicidality Scale. The selection of variables was guided by a review of the
scientific literature and validated by consultation with academic psychologists to ensure both theoretical and clinical
relevance.
To analyze these data, five supervised machine learning models were implemented: Logistic Regression, Random
Forest, Support Vector Machines (SVM), Neural Networks, and XGBoost. Beyond comparing traditional and advanced
models, the study emphasizes technical strategies often underutilized in psychological research, including
feature engineering, class balancing, cross-validation, and recursive feature elimination (RFECV). These tools not
only improve predictive performance but also expand interpretability by highlighting both established risk factors,
such as depression and hopelessness, and less commonly studied variables, including neurodivergence and gender.
Logistic Regression illustrates the value of these techniques: using raw numerical data, recall for suicidal ideation
improved from 0.82 to 0.88 and accuracy from 0.82 to 0.85 after incorporating class balancing and optimization.
Categorical data, more common in clinical contexts, increased recall further to 0.92 but reduced accuracy to 0.65,
showing the trade-offs when only diagnostic categories rather than raw test scores are available. This experimental
comparison underscores how data representation shapes predictive outcomes and helps bridge clinical practice with
computational modeling.
By connecting psychological theory with predictive modeling, this research demonstrates the potential of machine
learning to enhance early detection of suicidal ideation in university populations. The findings offer insights for
advancing computational approaches to mental health and for guiding interventions in a population where prevention
is both urgent and feasible.
This work has also been submitted to the 15th IEEE International Conference on Pattern Recognition Systems
(ICPRS 2025), reflecting its alignment with current discussions in pattern recognition and applied machine learning.
Submission Number: 52
Loading