Abstract: Large Language Models (LLMs) have attracted interest due to their sophisticated natural language understanding capacities. Nevertheless, the employment of their d-vectors remains barely explored in the mental health domain despite LLMs’ potential. In this article, we perform feature selection strategies over the embeddings extracted from LLama-2 and MentaLlama models to solve fine-grain mental health topic classification by removing redundant features. All the proposals were evaluated across two realistic datasets with unbalanced topics, in which the use of the feature sets containing 1,000 of the 4,096 initial dimensions resulted in the most efficient solution, achieving a reduction of 75% on the complexity with minimal damage in W-F1 performance of the systems in the test set (a decrease of 1.51 percentage points for 7Cups dataset for Llama-2, and of 0.27 percentage points for Counsel-Chat for MentaLlama). Our findings suggest that applying feature selection approaches over the d-vectors extracted from LLMs could be beneficial, especially in cases with scarce computational resources.
Loading