Analysis of Transformers for Medical Image Retrieval

Arvapalli Sai Susmitha; Vinay P. Namboodiri

Analysis of Transformers for Medical Image Retrieval

Arvapalli Sai Susmitha, Vinay P. Namboodiri

Published: 06 Jun 2024, Last Modified: 06 Jun 2024MIDL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Content-Based Medical Image Retrieval, Vision Transformers, Deep Learning, Contrastive Learning, Explainable AI.

Abstract: This paper investigates the application of transformers to medical image retrieval. Although various methods have been attempted in this domain, transformers have not been extensively explored. Leveraging vision transformers, we consider co-attention between image tokens. Two main aspects are investigated: the analysis of various architectures and parameters for transformers and the evaluation of explanation techniques. Specifically, we employ contrastive learning to retrieve attention-based images that consider the relationships between query and database images. Our experiments on diverse medical datasets, such as ISIC 2017, COVID-19 chest X-ray, and Kvasir, using multiple transformer architectures, demonstrate superior performance compared to convolution-based methods and transformers using cross-entropy losses. Further, we conducted a quantitative evaluation of various state-of-the-art explanation techniques using insertion-deletion metrics, in addition to basic qualitative assessments. Among these methods, Transformer Input Sampling (TIS) stands out, showcasing superior performance and enhancing interpretability, thus distinguishing it from black-box models.

Latex Code: zip

Copyright Form: pdf

Submission Number: 211

Loading