RSTSIC: Reparameterized Swin Transformer Stereo Image Compression

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Stereo image compression (SIC) aims to enhance compression performance and efficiency by exploiting cross-view redundancy in overlapping fields between stereo images. However, current SIC methods faces practical limitations in adequately exploiting inter-view correlations and contextual information due to occlusions, disparity variations, and computational overhead. To effectively extract contextual information and efficiently model cross-view dependencies in stereo images, we propose a novel distributed stereo image compression framework, Reparameterized Swin Transformer Stereo Image Compression (RSTSIC) integrating Reparameterized Swin Block (RSB) and Cross Feature Enhancement Modules (CFEMs) in the joint decoder. CFEMs progressively aggregate cross-view dependencies and enhance cross feature interaction efficiency. RSB integrates window-based self-attention with convolutional operations to effectively leverage non-local contextual information, while maintaining inference efficiency through structural reparameterization. RSTSIC outperforms traditional codecs and deep stereo compression methods on both Cityscapes and InStereo2K datasets, with at least 58.57\% reduction in model parameters and 36.43\% decrease in FLOPs compared to state-of-the-art compression models. Ablation studies confirm the necessity of CFEMs and RSB for efficient compression and perceptual fidelity. Our code is available at https://github.com/SnowBlind0/RSTSIC.
Supplementary Material: pdf
Submission Number: 170
Loading