Abstract: More than 3% of people worldwide experience depression. This diagnosis is established through interviews and clinical observations, which is a time- and money-demanding process. Additionally, there are a variety of symptoms associated with depression that are difficult to capture due to the limited capabilities of a human being. Many studies propose methods of automatic mental disorder recognition (MDR) using machine learning methods that are based on acoustic or linguistic feature extraction followed by a complex process of selection of the most suitable characteristics. Nevertheless, the data-collecting process is difficult; thus, the solution for MDR must be able to handle limited data and avoid complicated and uninterpretable feature engineering processes. Hereby, we propose four methods based on the fine-tuned Wav2Vec-2.0 model. These approaches overcome the mentioned limitations since this transformer model is able to capture information from both acoustic and linguistic modalities and does not require a big collection of labelled data. Moreover, three of the proposed methods are novel approaches to long audio classification problems and allow us to evaluate the capabilities of acoustic transformer models to deal with long speech recordings.
External IDs:dblp:conf/aist/ShermanISKD24
Loading