Semantic Concept Conditioning for State Space Image Super-Resolution

Published: 24 Apr 2026, Last Modified: 01 Jun 2026VisCon 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: super-resolution, state space models, mamba, semantic guidance
TL;DR: We leverage discovered visual concepts as explicit semantic priors to enable non-causal global conditioning in a linear-complexity State Space Model, achieving state-of-the-art single-image super-resolution without multi-directional scanning.
Abstract: Structured visual concepts, such as semantic regions discovered within an image, offer an underexplored source of prior knowledge for low-level vision. Single-image super-resolution (SISR) requires recovering both global structural coherence and fine local details, yet most existing methods treat the input as an unstructured pixel grid, neglecting the rich conceptual organization inherent in natural scenes. We propose a dual-branch CNN--SSM architecture that explicitly leverages discovered visual concepts as computational primitives. Our Semantic-Guided Grouping Network (SGGN) extracts instance-level concept masks via lightweight segmentation, using them to dynamically reorder tokens for a State Space Model (SSM). The Semantic Attentive State Space Equation (SASSE) injects these concept-level priors into the SSM's readout, enabling non-causal global conditioning with a single scan at linear complexity. To preserve intra-concept spatial topology, we introduce geometry-aware traversals and stochastic concept shuffling, preventing the model from memorizing spurious concept orderings. Ensemble Consistency Regularization coordinates the heterogeneous branches during training. Our approach demonstrates that principled integration of visual concept representations substantially enhances structural coherence in image restoration, achieving state-of-the-art performance across standard SISR benchmarks.
Submission Number: 21
Loading