Visual Medical Entity Linking with VELCRO

Kathryn Carbone, Liam Hebert, Robin Cohen, Lukasz Golab

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: visual entity linking, image segmentation, contrastive learning
Track: Proceedings
Abstract: We study a visual entity linking (VEL) problem in which a user selects a region of interest (RoI) in an image (e.g., a brain tumour) and queries a textual knowledge base (KB) for information about the RoI. To solve this problem using cross-modal embeddings such as CLIP, we can encode the KB entries, then either encode the whole image or just the cropped RoI, and run a similarity search between the query and the KB embeddings. However, using the entire image as the query may retrieve KB entries related to other aspects of the image beyond the RoI, whereas using the RoI alone as the query ignores context, which is critical for recognizing and linking complex entities in medical images. To address these shortcomings, we propose VELCRO – visual entity linking with contrastive RoI alignment – which adapts an image segmentation model to VEL by aligning the contextual embeddings produced by its decoder with the KB using contrastive learning. This strategy preserves the information contained in the surrounding image while focusing KB alignment on the RoI. Experiments on medical VEL show that VELCRO achieves 95.3% linking accuracy compared to 83.9% or lower for baselines.
General Area: Models and Methods
Specific Subject Areas: Medical Imaging, Representation Learning
Supplementary Material: zip
Data And Code Availability: Yes
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Submission Number: 168
Loading