Multiple Span Bidirectional RWKV Network for Infrared Image Super-Resolution

Published: 2025, Last Modified: 25 Jan 2026Int. J. Mach. Learn. Cybern. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Exploring long-range dependencies is essential for infrared image super-resolution. While Transformers can model these interactions, their quadratic complexity limits their usability, especially for high-resolution infrared images. The RWKV model, known for its efficiency in handling long sequences in NLP, offers a potential solution for global interactions modeling. To harness the benefits of this model for infrared image super-resolution, we introduce a multiple span bidirectional RWKV network, termed MSB-RWKV, to efficiently model long-range dependencies and local context restoration clues in infrared images. To enable the RWKV model to effectively process 2D infrared images, we incorporated two innovative strategies tailored for this application. First, we propose a Multiple Span Bidirectional WKV (MSB-WKV) attention mechanism, which achieves efficient global dependency modeling with linear complexity. By integrating bidirectional attention and multi-span scanning, it captures a comprehensive global receptive field while effectively handling 2D spatial correlations across various directions. Besides, we designed a Wide Token Shift (Wide Shift) layer to enhance the network’s ability to model local dependencies. This layer shifts tokens in multiple directions over an extended context, allowing it to capture fine-grained details and contextual features in infrared images. Additionally, a prompt projection module is further deployed to more robustly characterize and capture restoration cues for inter-degradation diversity through learnable visual prompts. These innovations collectively position MSB-RWKV as a highly efficient and effective model for infrared image super-resolution. Comprehensive experiments validate that MSB-RWKV outperforms state-of-the-art methods, demonstrating its superior performance.
Loading