Keywords: treatment planning, agent, self-improvement
Abstract: Formulating a treatment plan is inherently a complex reasoning and refinement task rather than a simple generation problem. However, existing large language models (LLMs) mainly rely on one-shot output without explicit verification, which may result in rough, incomplete, and potentially unsafe treatment plans. To address these limitations, we propose **TheraAgent**, an agentic framework that replaces one-shot generation with an iterative *generate-judge-refine* pipeline. By mirroring the actual reasoning process of human experts who iteratively revise treatment plans, our framework progressively transforms coarse and incomplete drafts into precise, comprehensive, and safer therapeutic regimens. To facilitate the critical *judge* component, we introduce **TheraJudge**, a treatment-specific evaluation module integrated into the inference loop to enforce clinical standards. Experiments show TheraAgent achieves state-of-the-art results on HealthBench, leading in Accuracy and Completeness. In expert evaluations, it attains an 86\% win rate against physicians, with superior Targeting and Harm Control. Moreover, the highly agreement between TheraJudge and HealthBench evaluations confirms the reliability of our framework.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: clinical decision support, medical question answering
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English, Chinese
Submission Number: 8336
Loading