CLIP It Right: Connecting Medical Images with Meaning

19 Jul 2025 (modified: 17 Aug 2025)MICCAI 2025 Challenge MEC SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal AI, BiomedCLIP, Hands-on Workshop, Jupyter Notebook
TL;DR: A interactive introdution into Multimodal AI with BiomedCLIP
Abstract: What is the most effective methodology for training machines to interpret an MRI scan and produce a report with the same level of detail as a radiologist? This is an interactive introduction to multimodal AI in medicine. During the session, you will learn how AI understands and combines clinical data. In this workshop, we will explore and experiment with a multimodal AI model, BiomedCLIP-PubMedBERT, which links clinical reports with medical images (e.g. chest X-rays) in a shared semantic space for predictions, retrieval and classification tasks. In this workshop, you will gain insight into how AI "understands" text, "sees" images, and connects text and images to make predictions. We will methodically walk through the CLIP pipeline to help you understand how each submodule works. The workshop will commence with an introduction to tokenisation, token encoders and text encoders. The next step is to examine image encoders. Finally, we will explain how both modalities are connected in a shared embedding space. The objective of the workshop is to provide students with a straightforward introduction to multimodal AI and CLIP, while also analysing the general CLIP pipeline to inform future research. The interactive notebook is available here: https://colab.research.google.com/drive/14eQKDbnxY7ly2JaQG-C1D-4zgpM5VpYd?usp=sharing&source=post_page-----c0ca52fe962b--------------------------------------- With the corresponding blog post here: https://medium.com/@skm.sarah.kaye2/clip-it-right-connecting-medical-images-with-meaning-c0ca52fe962b
Submission Number: 2
Loading