ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration

Junyu Chen; Yufan He; Eric Frey; Ye Li; Yong Du

ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration

Junyu Chen, Yufan He, Eric Frey, Ye Li, Yong Du

Published: 11 May 2021, Last Modified: 20 Jul 2025MIDL 2021 PosterReaders: Everyone

Keywords: Image Registration, Vision Transformer, Convolutional Neural Networks

Abstract: In the last decade, convolutional neural networks (ConvNets) have dominated and achieved state-of-the-art performances in a variety of medical imaging applications. However, the performances of ConvNets are still limited by lacking the understanding of long-range spatial relations in an image. The recently proposed Vision Transformer (ViT) for image classification uses a purely self-attention-based model that learns long-range spatial relations to focus on the relevant parts of an image. Nevertheless, ViT emphasizes the low-resolution features because of the consecutive downsamplings, result in a lack of detailed localization information, making it unsuitable for image registration. Recently, several ViT-based image segmentation methods have been combined with ConvNets to improve the recovery of detailed localization information. Inspired by them, we present ViT-V-Net, which bridges ViT and ConvNet to provide volumetric medical image registration. The experimental results presented here demonstrate that the proposed architecture achieves superior performance to several top-performing registration methods.

Paper Type: both

Primary Subject Area: Image Registration

Secondary Subject Area: Image Registration

Paper Status: original work, not submitted yet

Source Code Url: https://github.com/junyuchen245/ViT-V-Net_for_3D_Image_Registration

Data Set Url: The MRI brain data was acquired as part of an IRB protocol and is not approved for public release.

Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.

Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/vit-v-net-vision-transformer-for-unsupervised/code)

4 Replies

Loading