Keywords: DINOv3, Medical Image Segmentation, Mixture-of-Experts
TL;DR: we have developed a mixture-of-experts guided dual transformer model for multi-scale medical image segmentation.
Abstract: Precise delineation of anatomical structures from medical images is critical for clinical diagnosis and treatment planning, yet remains profoundly challenging due to ambiguous boundaries, extreme scale variations, and the heterogeneous appearances of pathological tissues. Current segmentation methods frequently fall short in effectively balancing global contextual understanding with adaptive, multi-scale feature fusion, limiting their robustness across diverse clinical scenarios. To address these limitations, we propose D$^2$-Former, a novel encoder-decoder framework that integrates a dual-encoder architecture-combining a Swin Transformer for hierarchical local-global modeling and a DINOv3 foundation model for high-fidelity dense feature extraction-with a Softer Mixture-of-Experts (Softer-MoE) module for input-adaptive feature refinement. Our design further introduces a Spatial-Frequency Gated Channel Attention (SF-GCA) module to fuse complementary encoder representations and a Residual Attention Decoder (RAD) with deep supervision for progressive map reconstruction. Extensive experiments across nine public benchmarks-spanning polyp segmentation, retinal vessel delineation, multi-organ abdominal CT segmentation, and nuclei instance segmentation-demonstrate that D$^2$-Former achieves state-of-the-art or highly competitive performance. The model exhibits strong generalization across varied anatomical scales, imaging modalities, and clinical scenarios, underscoring its potential for reliable computer-assisted diagnosis.
Primary Subject Area: Application: Radiology
Secondary Subject Area: Segmentation
Registration Requirement: Yes
Reproducibility: https://github.com/Shuvo001/MoE-DT
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Replace NNN with your OpenReview submission ID., Includes \documentclass{midl}, \jmlryear{2026}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package., Did not use the times package., Use the correct spelling and format, avoid Unicode characters, and use LaTeX equivalents instead., Any math in the title and abstract must be enclosed within $...$., Did not override the bibliography style defined in midl.cls and did not use \begin{thebibliography} directly to insert references., Avoid using \scalebox; use \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., No separate supplementary PDF uploads., Acknowledgements, references, and appendix must start after the main content.
Latex Code: zip
Copyright Form: pdf
Submission Number: 23
Loading