Keywords: Image Registration, Vision Transformers, Convolutional Neural Networks
TL;DR: Can Transformers capture long-range displacements better than CNNs?
Abstract: Convolutional Neural Networks (CNNs) are well-established in medical imaging tackling various tasks. %including image registration. However, their performance is limited due to their incapacity to capture long spatial correspondences within images. Recently proposed deep-learning-based registration methods try to overcome this limitation by assuming that transformers are better at modeling long-range displacements thanks to the nature of the self-attention mechanism. Even though existing transformers are already considered state-of-the-art in image registration, there is no extensive validation of the key premise. In this work, we test this hypothesis by evaluating the target registration error as a function of the displacement. Our findings show that transformers outperform CNNs on a public dataset of lung 3D CT images with large displacements. Yet, the performance difference stems from transformers registering small displacements with higher accuracy. Contrary to previous beliefs, we find no evidence to support the hypothesis that transformers register long displacements better than CNNs. Additionally, our experiments provide insights on how to train vision transformers effectively for image registration on small datasets with less than 50 image pairs.
Registration: I acknowledge that acceptance of this work at MIDL requires at least one of the authors to register and present the work during the conference.
Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
Paper Type: novel methodological ideas without extensive validation
Primary Subject Area: Image Registration
Secondary Subject Area: Validation Study
Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.