UltraLightUNet: Rethinking U-shaped Network with Multi-kernel Lightweight Convolutions for Medical Image Segmentation
Keywords: Ultra Lightweight CNN, Medical Imaging, Semantic Segmentation, 3D Segmentation
TL;DR: UltraLightUNet (2d and 3D) for medical image segmentation
Abstract: In this paper, we introduce UltraLightUNet (2D and 3D), an ultra-lightweight, multi-kernel U-shaped network for medical image segmentation. The core of UltraLightUNet consists of a new Multi-kernel Inverted Residual (MKIR) block, which can efficiently process images through multiple kernels while capturing complex spatial relationships. Additionally, our Multi-kernel Inverted Residual Attention (MKIRA) block refines and emphasizes image salient features via sophisticated convolutional multi-focal attention mechanisms. UltraLightUNet strategically employs the MKIR block in the encoder for feature extraction and the MKIRA block in the decoder for feature refinement, thus ensuring targeted feature enhancement at each stage. With only 0.316M \#Params and 0.314G #FLOPs, UltraLightUNet offers an ultra-lightweight yet powerful segmentation solution that outperforms state-of-the-art (SOTA) methods across twelve medical imaging benchmarks. Notably, UltraLightUNet surpasses TransUNet on DICE score while using 333$\times$ fewer \#Params and 123$\times$ fewer #FLOPs. Compared to the lightweight model, UNeXt, UltraLightUNet improves DICE scores by up to 6.7% with 4.7$\times$ fewer parameters. UltraLightUNet also outperforms recent lightweight models such as MedT, CMUNeXt, EGE-UNet, Rolling-UNet, and UltraLight_VM_UNet, while using significantly fewer #Params and #FLOPs. Furthermore, our 3D version, UltraLightUNet3D-M (1.42M #Params and 7.1G #FLOPs), outperforms SwinUNETR (62.19M #Params, 328.6G #FLOPs) and nn-UNet (31.2M #Params, 110.4G #FLOPs) on the FETA, MSD Brain Tumor, Prostate, and Lung Cancer segmentation benchmarks. This remarkable performance, combined with substantial computational gains, makes UltraLightUNet an ideal solution for real-time and point-of-care services in resource-constrained environments. We will make the code publicly available upon paper acceptance.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8307
Loading