Keywords: Low-level vision; Efficient Image Super-resolution; State Space Model
Abstract: The state space model (SSM) has garnered significant attention recently due to its exceptional long-range modeling capabilities achieved with linear-time complexity, enabling notable success in efficient super-resolution. However, applying SSMs to vision tasks typically requires scanning 2D visual data with a 1D-sequence form, which disrupts inherent semantic relationships and introduce artifacts and distortions during image restoration. To address these challenges, we propose a novel SP-MoMamba method that integrate SSMs with the semantic preservation capability of superpixels and the efficiency advantage of Mixture-of-Experts (MoE). Specifically, we pioneer the use of superpixel features as semantic units to reconstruct the SSM scanning method, proposing the Superpixel-driven State Space Model (SP-SSM) as a basic building block of SP-MoMamba. Furthermore, we introduce the Multi-Scale Superpixel Mixture of State Space Experts (MSS-MoE) scheme to strategically integrate SP-SSMs across scales, effectively harnessing the complementary semantic information from multiple experts. This multi-scale expert integration significantly reduces the number of pixels processed by each SSM while enhancing the reconstruction of fine details through specialized experts operating at different semantic scales. This framework enables our model to deliver superior performance with minimal computational overhead.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8304
Loading