Improving Radiology Report Generation with D-Net: When Diffusion Meets Discriminator

Yuda Jin, Weidong Chen, Yuanhe Tian, Yan Song, Chenggang Yan, Zhendong Mao

Published: 01 Jan 2024, Last Modified: 08 Apr 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Radiology report generation (RRG) aims to automatically provide observations and insight into a patient’s condition based on radiology images, which is able to greatly reduce the workload of physicians on the premise of ensuring the quality of medical treatment. Existing works leverage the Transformer decoder to generate reports word-by-wordly. However, unlike image captioning, radiology reports are long text containing many semantic words. The autoregressive method, such as the Transformer-base method, will accumulate errors in the generation process and generate unsatisfied reports. Benefiting from the recent success of Diffusion, we propose a novel Diffusion-based paradigm for RRG, which leverages visual information as a condition, making the generation process focus on pathological features within the radiology image. Meanwhile, we integrate a discriminator into each layer of the Diffusion to actively judge whether the generated words are meaningful, which, on the one hand, controls the length of predicted reports and, on the other hand, calibrates confidence scores and token generation results, improving the quality of the generated reports. Extensive experiment results demonstrate the superiority of our proposed method. Source code is available at: https://github.com/Yuda-Jin/D-2-Net.