Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: radiology report generation, x-ray report generation, self-critiquing mechanism
TL;DR: Learning self-critiquing mechanisms to improve the abnormality localization for accurate report generation.
Abstract: Automatic radiology reporting assists radiologists in diagnosing abnormalities in radiology images, where grounding the automatic diagnosis with abnormality locations is important for the report interpretability. However, existing supervised-learning methods could lead to learning the superficial statistical correlations between images and reports, lacking multi-faceted reasoning to critique the relevant regions on which radiologists would focus. Recently, self-critical reasoning has been investigated in test-time scaling approaches to alleviate hallucinations of LLMs with increased time complexity. In this work, we focus on chest X-ray report generation with particular focus on clinical accuracy, where self-critical reasoning is alternatively introduced into the model architecture and their training objective, preferred by the real-time automatic reporting system. In particular, three types of self-critical reasoning are proposed to critique the hypotheses of grounded abnormalities compared to i) alternative abnormalities, ii) alternative patient's X-ray image, and iii) potential false negative abnormalities. To realize this, we propose a novel Radiology Self-Critiquing Reporting (RadSCR) framework, which constructs the abnormality proposals for each localized abnormality region and verify them by the proposed self-critiquing mechanisms accordingly. The critiqued results of the abnormality proposals are then integrated to generate the completed report with interpretable diagnostic process. Our experiments show the state-of-the-art performance achieved by RadSCR in the grounded report generation and diagnosis critiquing, demonstrating its effectiveness in generating the clinically accurate report.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 10495
Loading