WKV-sharing embraced random shuffle RWKV high-order modeling for pan-sharpening

Man Zhou; Xuanhua He; Danfeng Hong; Bo Huang

WKV-sharing embraced random shuffle RWKV high-order modeling for pan-sharpening

Man Zhou, Xuanhua He, Danfeng Hong, Bo Huang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Pan-sharpening, Image Fusion

TL;DR: This study pioneers a framework to address theoretical bias in RWKV and explore its potential for multi-spectral and panchromatic fusion, bridging a key gap in remote sensing image fusion.

Abstract: Pan-sharpening aims to generate a spatially and spectrally enriched multi-spectral image by integrating complementary cross-modality information from low-resolution multi-spectral image and texture-rich panchromatic counterpart. In this work, we propose a WKV-sharing embraced random shuffle RWKV high-order modeling paradigm for pan-sharpening from Bayesian perspective, coupled with random weight manifold distribution training strategy derived from Functional theory to regularize the solution space adhering to the following principles: 1) Random-shuffle RWKV. Recently, the Vision RWKV model, with its inherent linear complexity in global modeling, has inspired us to explore its untapped potential in pan-sharpening tasks. However, its attention mechanism, relying on a recurrent bidirectional scanning strategy, suffers from biased effects and demands significant processing time. To address this, we propose a novel Bayesian-inspired scanning strategy called Random Shuffle, complemented by a theoretically-sound inverse shuffle to preserve information coordination invariance, effectively eliminating biases associated with fixed sequence scanning. The Random Shuffle approach mitigates preconceptions in global 2D dependencies in mathematical expectation, providing the model with an unbiased prior. In line with similar spirit of Dropout, we introduce a testing methodology based on Monte Carlo averaging to ensure the model’s output aligns more closely with expected results. 2) WKV-sharing high-order. Regarding KV’s attention score calculation in spatial mixer of RWKV, we leverage WKV-sharing mechanism to transfer KV activations across RWKV layers, achieving lower latency and improved trainability, and revisit the channel mixer in RWKV, originally a first-order weighting function, and redevelop its high-order potential by sharing the gate mechanism across RWKV layer. Comprehensive experiments across pan-sharpening benchmarks demonstrate our model’s effectiveness, consistently outperforming state-of-the-art alternatives

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 572

Loading