Keywords: Behavioral Healthcare, Clinical Decision Support, Large Language Models (LLMs), Interpretable Models, Explainable Models
Abstract: Behavioral healthcare risk assessment remains a challenging problem due to the highly multimodal nature of patient data and the temporal dynamics of mood and affective disorders. While large language models (LLMs) have demonstrated impressive reasoning capabilities, their effectiveness in structured clinical risk scoring remains unclear. In this work, we introduce HARBOR, a Behavioral Health–aware language model designed to predict a discrete mood and risk score, termed the Harbor Risk Score (HRS), on a Likert scale from -3 (severe depression) to +3 (mania). We also release PEARL, a longitudinal behavioral healthcare dataset spanning four years of monthly observations from three patients, containing physiological, behavioral, and self-reported mental health signals. We benchmark traditional machine learning models, proprietary LLMs, and HARBOR across multiple evaluation settings and ablations. Our results show that HARBOR substantially outperforms both classical baselines and off-the-shelf LLMs, achieving a 69% accuracy compared to 54% for logistic regression and 29% for the strongest proprietary LLM baseline.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Clinical Applications, Large Language Models, Interpretability, Alignment in LLMs, Resources and Evaluation
Contribution Types: Model analysis & interpretability, Data resources, Position papers
Languages Studied: English
Submission Number: 3964
Loading