Automatic Post-Traumatic Stress Disorder Diagnosis via Clinical Transcripts: A Novel Text Augmentation with Large Language Models
Abstract: Post-traumatic stress disorder (PTSD) is one of the predominant mental disorders in the world. With the development of machine learning (ML), more people have started using natural language processing (NLP) models to make early mental disorder diagnoses, including PTSD. However, these NLP tasks often suffer from data imbalance issues due to the data collection difficulty. Therefore, the current study proposed two novel text augmentation frameworks to cope with data imbalance issues for clinical NLP tasks by leveraging Large Language Models (LLMs). The proposed frameworks utilize two distinct methodologies to augment the original dataset, thereby extending the publicly available Extended Distress Analysis Interview Corpus (E-DAIC) for PTSD. These methodologies involve generating standardized transcripts of PTSD interviews through a zero-shot (ZS) approach and rephrasing the existing training samples in the dataset via a few-shot (FS) approach. The FS and ZS augmented datasets outperform the original EDAIC dataset in automatic PTSD diagnosis. The ZS dataset, with GPT embeddings, achieves the highest performance, demonstrating the potential of LLMs to generate authentic clinical interviews and resolve data imbalance. Despite the FS approach performing slightly inferior to ZS, it still surpasses the original dataset with fewer samples and simplified prompts. The augmented dataset maintains high similarity to the original EDAIC dataset. This research has significant implications, enabling individual ML researchers to leverage powerful LLMs for innovative applications, reducing labour and time costs. LLMs can generate synthetic data at a fraction of the expense of recruiting human volunteers, facilitating future clinical NLP tasks. The approach offers flexibility in generating realistic and professional content through prompt design.
Loading