Abstract: Developing specialized dialogue systems for mental health support requires multi-turn conversation data, which has recently garnered increasing attention. However, gathering and releasing large-scale and real-life multi-turn conversations to facilitate advancements in mental health presents challenges due to data privacy protection, as well as the time and cost involved. To address the challenges related to data scarcity, we introduce smile, a single-turn to multi-turn inclusive language expansion technique that prompts ChatGPT to rewrite public single-turn dialogues into multi-turn ones. Our work begins with the analysis of language transformation, validating the feasibility of the proposed method. We conduct a study on dialogue diversity, including lexical features, semantic features, and dialogue topics, demonstrating the effectiveness of our proposed method. Furthermore, we employ our method to generate a large-scale, lifelike, and diverse dialogue dataset named SmileChat, comprising 55,165 dialogues in total with an average of 10.4 turns per dialogue. Finally, we utilize the collected corpus to develop a mental health chatbot, MeChat. To better assess the quality of SmileChat, we collect a small-scale real-life chat dataset comprising 82 counseling dialogues for model evaluation. Both automatic and human evaluations demonstrate that our trained dialogue system exhibits significant improvements, and SmileChat is of high quality.
Paper Type: long
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Chinese
0 Replies
Loading