Ethical Dialogue Modeling: Dual-Phase Prompt Design for Safety-Constrained Mental Health Language Models

16 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Therapeutic Dialogue Systems, Large Language Models, Safety-Constrained Prompt Design, Contextual Risk Stratification, Therapeutic Alliance Modeling, Affective State Inference, Clinical AI Safeguards
TL;DR: We propose a dual-phase prompt framework that enhances safety and therapeutic relevance in mental health language models.
Abstract: We present Ethical Dialogue Modeling (EDM), a two-phase architecture for producing safety-aware conversational support in mental health contexts. The first phase transforms incoming text and audio into structured signals that capture affective intensity, therapeutic need, figurative expression, and calibrated risk. The second phase conditions response synthesis on these signals and enforces protocol-aligned constraints, template scaffolding, and conservative decoding to reduce unsafe outputs. The framework is developed using a hybrid corpus of synthetic and publicly available de-identified dialogues and is evaluated with automated proxies alongside blinded expert judgments. Empirical analyses demonstrate improved contextual adaptation, stronger alignment with expert norms, and a reduction in hazardous responses compared with strong baselines. We also describe an evaluation protocol designed to combine scalable automated checks with focused expert adjudication and provide design principles for deploying culturally sensitive, safety-governed conversational systems.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 7307
Loading