Abstract: The apparition of Large Language Models has attracted the interest of the research community as well as the general public due to the impressive improvement of their communicative and comprehensive capacities in general conversations. Nevertheless, there are still many domains that require further evaluation, especially those related to sensitive data and users, such as mental health. In this article, we evaluate several ensemble approaches to combine the Zero-Shot predictions of several families of open-source Language Models, specifically, RoBERTa and LLama-2, in the task of mental-health topics classification under limited data and computational resource conditions. With this purpose, we employed two datasets containing realistic questions and answers, Counsel-Chat and 7Cups datasets labeled in 28 and 39 fine-grain unbalanced mental-health topics. The best ensembles of non-fine-tuned models with Zero-Shot approaches achieved an accuracy (ACC) of 43.29%, weighted-F1 (W-F1) of 41.32% and Macro-F1 (M-F1) of 31.79% in the 28 topics of Counsel-Chat; and 35.57% of ACC, 39.66% W-F1 and 28.12% of M-F1 in the 39 topics of 7Cups dataset. The error analysis reveals that models have difficulties in detecting less concrete topics (e.g. ‘Social’), which suggests future lines to re-organize classes in topics and sub-topics, or the incorporation into the ensemble of models adapted to these domains to compensate for these errors.
Loading