BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum

BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum

ACL ARR 2025 May Submission448 Authors

12 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) as clinical agents require careful behavioral adaptation. While adept at reactive tasks (e.g., diagnosis reasoning), LLMs often struggle with proactive engagement, like unprompted identification of critical missing information or risks. We introduce **BehaviorBench**, a comprehensive dataset to evaluate agent behaviors across a clinical assistance spectrum, ranging from reactive query responses to proactive interventions (e.g., clarifying ambiguities, flagging overlooked critical data). Our BehaviorBench experiments reveal LLMs' inconsistent proactivity. To address this, we propose **BehaviorSFT**, a novel training strategy using behavioral tokens to explicitly condition LLMs for dynamic behavioral selection along this spectrum. BehaviorSFT boosts performance, achieving up to 97.3\% overall Macro F1 on BehaviorBench and improving proactive task scores (e.g., from 95.0\% to 96.5\% for Qwen2.5-7B-Ins). Crucially, blind clinician evaluations confirmed BehaviorSFT-trained agents exhibit more realistic clinical behavior, striking a superior balance between helpful proactivity (e.g., timely, relevant suggestions) and necessary restraint (e.g., avoiding over-intervention) versus standard fine-tuning or explicit instructed agents.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Healthcare Agents, Proactive Agent

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Keywords: healthcare agent, proactive agent, behavioral adaptation

Submission Number: 448

Loading