Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: medical images, vision language pre-training
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Vision-Language Pre-training (VLP) has shown the merits of analysing medical
images, by leveraging the semantic congruence between medical images and their
corresponding reports. It efficiently learns visual representations, which in turn fa-
cilitates enhanced analysis and interpretation of intricate imaging data. However,
such observation is predominantly justified on single-modality data (mostly 2D
images like X-rays), adapting VLP to learning unified representations for medical
images in real scenario remains an open challenge. This arises from medical im-
ages often encompass a variety of modalities, especially modalities with different
various number of dimensions (e.g., 3D images like Computed Tomography). To
overcome the aforementioned challenges, we propose an Unified Medical Image
Pre-training framework, namely UniMedI, which utilizes diagnostic reports as
common semantic space to create unified representations for diverse modalities
of medical images (especially for 2D and 3D images). Under the text’s guidance,
we effectively uncover visual modality information, identifying the affected areas
in 2D X-rays and slices containing lesion in sophisticated 3D CT scans, ultimately
enhancing the consistency across various medical imaging modalities. To demon-
strate the effectiveness and versatility of UniMedI, we evaluate its performance
on both 2D and 3D images across 10 different datasets, covering a wide range of
medical image tasks such as classification, segmentation, and retrieval. UniMedI
has demonstrated superior performance in downstream tasks, showcasing its ef-
fectiveness in establishing a universal medical visual representation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4965
Loading