xLSTM-UNet can be an Effective Backbone for 2D & 3D Biomedical Image Segmentation Better than its Mamba Counterparts

Published: 25 Sept 2024, Last Modified: 23 Oct 2024IEEE BHI'24EveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Medical Image Segmentation, Long Range Sequential Modeling, Long Short-Term Memory (LSTM), State Space Models, UNet, Vision Mamba, Vision Transformer, xLSTM
TL;DR: In this work, we replace Mamba in UMamba with recent xLSTM, and surprisingly, it works well!
Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation. Yet, their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM has recently been proposed as the successor of Long Short-Term Memory (LSTM) networks and has demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, we provide the first integration of xLSTM with image segmentation backbone -- namely xLSTM-U, which extend the success of xLSTM in the biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency-capturing abilities of xLSTM, the proposed xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this paper highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. We believe this new finding will be of interest to the research community and may inspire future studies. The code, models, and datasets are publicly available at https://github.com/tianrun-chen/xLSTM-UNet-PyTorch/tree/main.
Track: 11. General Track
Registration Id: VSNSN79N9G7
Submission Number: 427
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview