GroundingDINO for Open-Set Lesion Detection in Medical Imaging

Samuel James Roughley; Johanna Paula Müller; Shangqi Gao; Zeyu Gao; Marta Ligero; Rudolfs Blums; Mireia Crispin-Ortuzar; Julia A. Schnabel; Bernhard Kainz; Cosmin I. Bercea; Ines Prata Machado

GroundingDINO for Open-Set Lesion Detection in Medical Imaging

Samuel James Roughley, Johanna Paula Müller, Shangqi Gao, Zeyu Gao, Marta Ligero, Rudolfs Blums, Mireia Crispin-Ortuzar, Julia A. Schnabel, Bernhard Kainz, Cosmin I. Bercea, Ines Prata Machado

Published: 21 Jul 2025, Last Modified: 21 Aug 2025MSB EMERGE 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Anomaly Detection, GroundingDINO, Prompt Engineering, Medical Imaging, Lesion Detection, Cancer Research

TL;DR: This work shows GroundingDINO outperforms YOLOv11n in medical anomaly detection and finds that organ-specific prompts help rare lesion detection.

Abstract: Open-world anomaly detection is a task in which machine learning is well-positioned to advance cancer diagnosis, potentially leading to significantly improved survival rates. For a model to be used in clinical settings, it must demonstrate high performance, robustness, and generalisability. A common approach to achieving high generalisability is to incorporate information from broader representations within the model. In this work, we investigate the application of GroundingDINO to medical anomaly detection and localisation, evaluating both its overall performance and the influence of text prompts. We find that GroundingDINO outperforms the YOLOv11n model even with minimal use of contextual information. When exploring methods to introduce more contextual information, we observe that specifying the organ within the prompt improves closed-set performance on rarer lesion classes. However, adding visual descriptions of lesions during training leads to a significant performance drop on those subsets, indicating that the model memorises prompt-image pairs rather than learning meaningful semantic relationships. Our work highlights a critical limitation of GroundingDINO in medical imaging and proposes targeted modifications to the model architecture or training strategies as promising directions for utilising richer semantic prompts to improve anomaly detection.

Camera Ready Submission: zip

Submission Number: 10

Loading