Exploring Deep Learning and Grad-CAM for Speech-Based Detection of Mild Traumatic Brain Injury

Fredy Rojas, Samaneh Madanian, John Michael Templeton, Christian Poellabauer, Sandra L. Schneider

Published: 01 Jan 2024, Last Modified: 20 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Mild traumatic brain injury (mTBI) is challenging to diagnose due to its subtle and transient symptoms, making noninvasive diagnostic tools crucial for early detection. This study explores the use of a custom ResNet deep learning model combined with the Grad-CAM interpretability technique for mTBI detection via speech analysis. Speech data were transformed into Mel-spectrograms and fed into the model for binary classification between concussed and control individuals. The Grad-CAM method provided insights into which frequency regions of the Mel-spectrogram were most important for the model's predictions, with higher-frequency regions identified as significant for the model in detecting mTBI. Using Monte Carlo Cross-Validation (MCCV), we evaluated 50 different subject train-test split configurations to gain insights into the model's performance stability and variability. This analysis can assess the model's ability to learn consistent patterns within the dataset and suggest potential generalization tendencies. The variability observed in the performance metrics distribution underscores the importance of robust evaluation methods, particularly when working with small datasets. The combination of deep learning, robust evaluation technique and interpretability in this study contributes to the development of clinically viable, speech-based tools for mTBI detection, with potential applications in sports and healthcare settings.