SAT-RRG: Self-Adaptive Training for Radiology Report Generation Leveraging LLMs for Dynamic Token-Level Refinement
Abstract: Existing radiology report generation (RRG) methods rely on word-level alignment with reference reports, making them overly sensitive to surface phrasing and blind to semantically valid variations. These methods lack semantic feedback mechanisms during training, treating all tokens uniformly and failing to prioritize critical corrections. As a result, models cannot dynamically assess or refine report quality, leading to clinically suboptimal outputs. We propose SAT-RRG, a self-adaptive training framework that identifies phrase-level semantic errors and provides token-level supervision by both correcting mistakes and reinforcing accurate predictions. We introduce two custom loss functions: CTAL, which consolidates confidently correct tokens, and ETAPL, which penalizes overconfident semantic errors. Both adapt to the evolving confidence landscape during training. Our framework builds upon a unified LLM backbone for both generation and error detection, ensuring no additional computational overhead during inference. SAT-RRG achieves state-of-the-art performance on MIMIC-CXR and IU-Xray. The code will be released upon publication.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: clinical NLP, Multimodality,
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2589
Loading