A Multi-Task LLM Framework for Multimodal Speech-Based Mental Health Prediction

Mai Ali; Christopher Lucasius; Tanmay Pranav Patel; Madison Aitken; Jacob Vorstman; Peter Szatmari; Marco Battaglia; Deepa Kundur

A Multi-Task LLM Framework for Multimodal Speech-Based Mental Health Prediction

Mai Ali, Christopher Lucasius, Tanmay Pranav Patel, Madison Aitken, Jacob Vorstman, Peter Szatmari, Marco Battaglia, Deepa Kundur

Published: 19 Aug 2025, Last Modified: 24 Sept 2025BSN 2025EveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the IEEE BSN 2025 conference submission's policy on behalf of myself and my co-authors.

Keywords: multi-task learning; multimodal speech analysis; large language models (LLM); depression prediction; longitudinal modeling; digital phenotyping; natural language processing, digital health.

TL;DR: This paper explores the use of multitask learning and speech-based trimodal data to predict depression relapse, suicidal ideation, and sleep disturbances in adolescents.

Abstract: Mental health disorders are often comorbid, highlighting the need for predictive models that can address multiple outcomes simultaneously. Multi-task learning (MTL) provides a principled approach to jointly model related conditions, enabling shared representations that improve robustness and reduce reliance on large disorder-specific datasets. In this work, we present a trimodal speech-based framework that integrates text transcriptions, acoustic landmarks, and vocal biomarkers within a large language model (LLM)-driven architecture. Beyond static assessments, we introduce a longitudinal modeling strategy that captures temporal dynamics across repeated clinical interactions, offering deeper insights into symptom progression and relapse risk. Our MTL design simultaneously predicts depression relapse, suicidal ideation, and sleep disturbances, reflecting the comorbid nature of adolescent mental health. Evaluated on the Depression Early Warning (DEW) dataset, the proposed longitudinal trimodal MTL model achieves a balanced accuracy of 70.8%, outperforming unimodal, single-task, and non-longitudinal baselines. These results demonstrate the promise of combining MTL with longitudinal monitoring for scalable, noninvasive prediction of adolescent mental health outcomes.

Track: 12. Emerging Topics (e.g. Agentic AI, LLMs for computational health with wearables)

NominateReviewer: Tuan Nguyen Gia: tuan.nguyengia@ieee.org

Submission Number: 141

Loading