Abstract: With the Mamba framework widely applied, the state-space model has yielded outstanding results in the field of computer vision. Nevertheless, the superiority of the model to its counterparts, namely, CNN-based or Transformer-based models, is limited, because it faces large challenges in local region feature extraction resulting from the deficient positional awareness and the disproportional emphasis on posterior tokens in its pre-defined scanning schedules. Besides, a majority of the current Mamba-based models usually fail to take into account the advantages of the integration of multi-visual encoding strategies. Against this background, we propose a novel DoubleU-Net framework with multiple visual encoding strategies and a local-based scanning mechanism. The comparative & ablation experiments with the current SOTA methods verify the superiority or competitiveness of the DUMFNet. For reproduction, the implementation codes can be checked out at https://github.com/Panpz202006/DUMFNet.
Loading