Keywords: Medical Image Segmentation Attention-Free Architectures Token Mixer Circulant Networks Encoder-Decoder, Lightweight Models, Global Context Modeling, Biomedical Imaging, Convolutional Neural Networks, Deep Learning
TL;DR: Med SegNet is a lightweight, attention-free model for medical image segmentation that resolves the locality-globality paradox by using an efficient circulant token mixer to achieve high accuracy with minimal computational cost.
Abstract: Medical image segmentation requires both fine local detail and reliable global context, yet common solutions trade accuracy for efficiency: CNNs are local and cheap, transformers are global but quadratic. We introduce Med-SegNet, a compact encoder–bottleneck–decoder architecture that couples inverted residual SE blocks with a Circulant Layer Token Mixer (CLTM) placed once at the bottleneck. CLTM performs a single global information exchange by projecting multi-scale encoder features to a shared token space and applying a depthwise 1D circular convolution with pre/post normalization, then re-projecting the mixed tokens back to each scale through residual connections. This attention-free design uses only standard convolutions, yielding near-linear mixing cost, low memory, and hardware-friendly deployment. Across 20 public datasets spanning 12 modalities, Med-SegNet with CLTM improves Dice on every dataset (20/20) over the ablated model, raising the mean from 0.8977 to 0.9161. Gains are largest on challenging, low-contrast settings such as BUSI ultrasound (+6.31 points) and RaViR ophthalmology (+6.12), while preserving near-ceiling performance on easier benchmarks. Despite a budget of roughly 2.07M parameters, Med-SegNet attains leading or competitive results, including Kvasir-SEG 0.9672, CVC-ClinicDB 0.9666, and ETIS 0.9612. By supplying global context at minimal cost, CLTM delivers sharper boundaries, improved long-range coherence, and practical latency offering an accuracy–efficiency point well suited to real-world clinical workflows.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 14216
Loading