Leveraging Hard Negative Priors for Automatic Medical Report Generation

Bhanu Prakash Voutharoja; Lei Wang; Luping Zhou

Leveraging Hard Negative Priors for Automatic Medical Report Generation

Bhanu Prakash Voutharoja, Lei Wang, Luping Zhou

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Medical Report Generation, Image Captioning, Hard Negatives

Abstract: Recently, automatic medical report generation has become an active research topic in medical imaging field. It is imperative for the model to identify normal and abnormal regions in a medical image to generate a coherent and diverse report. However, medical datasets are highly biased towards normal regions. This makes most existing models tend to generate a generic report without sufficiently considering the uniqueness of individual images. In this paper, we propose a learning framework to extract distinctive image and report features for each sample by distinguishing it from its closest peer (denoted as hard negative in this paper) and gradually increasing the difficulty of such a task through synthesizing harder and harder negatives during training. Specifically, a prior hard negative report, which is the report closest to an anchor report in the dataset, is initially identified by using a pre-trained Sentence Transformer. To force our report decoder to capture highly distinctive and image-correlated text features, harder and harder negative reports keep being synthesized by gradually moving the prior hard negative report towards the anchor report in the latent space during training. The harder negative report is used to evaluate a triplet loss that is minimized to enforce the distance between the matched image and report to be smaller than the distance between an image and its synthesized harder negative report. Meanwhile, the associated images of the anchor report and its prior hard negative report form a hard negative image pair, and a cosine similarity loss is used to capture the distinctive features of the anchor image by pushing the hard negative image away. In this way, our model could achieve subtle representative resolution (i.e., the ability to distinguish two similar samples). As a general method, we demonstrate experimentally that our framework could be readily incorporated into a variety of existing medical report generation models, and significantly improve the corresponding baselines. Our code will be publicly released at

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

5 Replies

Loading