Leveraging Large Language Models for In-context Data Generation to Address Bias and Scarce Data Challenges in Mental Healthcare

Leveraging Large Language Models for In-context Data Generation to Address Bias and Scarce Data Challenges in Mental Healthcare

ACL ARR 2026 January Submission10583 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Bias, Data Augmentation, Motivational Interviewing, Natural Language Processing, Psychotherapy

Abstract: Motivational Interviewing (MI) is a widely used, evidence-based counseling approach, yet the development of robust NLP systems for MI remains constrained by severe data scarcity, high annotation costs, and the need for domain expertise. While recent work has explored large language models (LLMs) for clinical text generation and augmentation, existing studies largely focus on utterance-level transformations or evaluate a limited set of models, leaving the role of long-range conversational context underexplored. In this work, we present the first systematic, session-level benchmark of LLM-based data augmentation for MI dialogues. We compare prompt-based generative augmentation at the full-session level with utterance-level task-sensitive transformation methods across 13 state-of-the-art (SOTa) LLMs and three long-context classification models designed for extended clinical conversations. As a key outcome, we also present ICAUGAnnoMI, a novel dataset of 1,764 low- and high-quality MI dialogues, spanning nearly ~81k talk turns between therapist and client alongside a fidelity-aware evaluation framework that assesses semantic drift, hallucination, and adherence to core MI principles. Our empirical results demonstrate that session-level augmentation consistently outperforms utterance-level approaches in improving MI session classification, particularly for minority and low-quality cases, highlighting the importance of preserving long-range conversational structure. Beyond performance gains, our analysis provides practical insights into the strengths and limitations of contemporary LLMs in sensitive mental health settings. We release our code and commit to publicly sharing the generated data to facilitate reproducible research and future benchmarking in MI-focused NLP. ICAUGAnnoMI data and source code is available at URL: https://anonymous.4open.science/r/ARR_Submission_Cycle_2026/README.md

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Language Modeling, Machine Learning for NLP, NLP Applications, Efficient/Low-Resource Methods for NLP, Resources and Evaluation:,

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Data resources, Data analysis

Languages Studied: English

Submission Number: 10583

Loading