Cascaded Contrastive Medical Language-Image Pretraining on Radiology Images

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Contrastive learning; medical imaging; multi-modality; clinical language model
Abstract: Due to the concise design and the wonderful generalization performance, contrastive language-image pre-training (CLIP) has been investigated in the medical domain for medical image understanding. However, few studies have been done on CLIP for multilevel medical information alignment. In this paper, we proposed cascaded CLIP (casCLIP) where contrastive alignment is performed on multilevel information. In addition, we propose aligning the report with the entire image series and employ a multi-layer transformer to integrate the image embeddings from a study into a single embedding of image series. Moreover, we introduce support alignment opposition de-alignment method to enhance higher-level alignment. In this study, casCLIP was pre-trained on a dataset of chest X-ray images with reports and the high level disease information extracted from the reports. Experimental results on multiple public benchmarks demonstrate the effectiveness of our model for zero-shot classification.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4168
Loading