Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation

Haozhe Luo; Yu Changdong; Raghavendra Selvan

Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation

Haozhe Luo, Yu Changdong, Raghavendra Selvan

Published: 28 Feb 2022, Last Modified: 16 May 2023MIDL 2022Readers: Everyone

Keywords: attention, transformers, u-net, segmentation

TL;DR: A dual encoder based hybdrid segmentation network with a parallel transformer along the CNN encoder that fuses global and local features using bidirectional cross attention.

Abstract: Most existing transformer-based network architectures for computer vision tasks are large (in number of parameters) and require large-scale datasets for training. However, the relatively small number of data samples in medical imaging compared to the datasets for vision applications makes it difficult to effectively train transformers for medical imaging applications. Further, transformer-based architectures encode long-range dependencies in the data and are able to learn more global representations. This could bridge the gap with convolutional neural networks (CNNs), which primarily operate on features extracted in local image neighbourhoods. In this work, we present a hybrid transformer-based approach for segmentation of medical images that works in conjunction with a CNN. We propose to use learnable global attention heads along with the traditional convolutional segmentation network architecture to encode long-range dependencies. Specifically, in our proposed architecture the local information extracted by the convolution operations and the global information learned by the self-attention mechanisms are fused using bi-directional cross attention during the encoding process, resulting in what we call a {\em hybrid ladder transformer} (HyLT). We evaluate the proposed network on two different medical image segmentation datasets. The results show that it achieves better results than the relevant CNN- and transformer-based architectures.

Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.

Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.

Paper Type: methodological development

Primary Subject Area: Segmentation

Secondary Subject Area: Application: Histopathology

Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.

4 Replies

Loading