TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

Published: 2025, Last Modified: 21 Jan 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 3D medical image segmentation is important for clinical diagnosis and treatment but faces challenges from high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. To alleviate these limitations, we propose TK-Mamba, a multimodal framework that fuses the linear-time Mamba with Kolmogorov-Arnold Networks (KAN) to form an efficient hybrid backbone. Our approach is characterized by two primary technical contributions. Firstly, we introduce the novel 3D-Group-Rational KAN (3D-GR-KAN), which marks the first application of KAN in 3D medical imaging, providing a superior and computationally efficient nonlinear feature transformation crucial for complex volumetric structures. Secondly, we devise a dual-branch text-driven strategy using Pubmedclip's embeddings. This strategy significantly enhances segmentation robustness and accuracy by simultaneously capturing inter-organ semantic relationships to mitigate label inconsistencies and aligning image features with anatomical texts. By combining this advanced backbone and vision-language knowledge, TK-Mamba offers a unified and scalable solution for both multi-organ and tumor segmentation. Experiments on multiple datasets demonstrate that our framework achieves state-of-the-art performance in both organ and tumor segmentation tasks, surpassing existing methods in both accuracy and efficiency. Our code is publicly available at https://github.com/yhy-whu/TK-Mamba
Loading