RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The outdoor vision systems are frequently contaminated by rain streaks and raindrops, which significantly degenerate the performance of visual tasks and multimedia applications. The nature of videos exhibits redundant temporal cues for rain removal with higher stability. Traditional video deraining methods heavily rely on optical flow estimation and kernel-based manners, which have a limited receptive field. Yet, transformer architectures, while enabling long-term dependencies, bring about a significant increase in computational complexity. Recently, the linear-complexity operator of the state space models (SSMs) has contrarily facilitated efficient long-term temporal modeling, which is crucial for rain streaks and raindrops removal in videos. Unexpectedly, its uni-dimensional sequential process on videos destroys the local correlations across the spatio-temporal dimension by distancing adjacent pixels. To address this, we present an improved SSMs-based video deraining network (RainMamba) with a novel Hilbert scanning mechanism to better capture sequence-level local information. We also introduce a difference-guided dynamic contrastive locality learning strategy to enhance the patch-level self-similarity learning ability of the proposed network. Extensive experiments on four synthesized video deraining datasets and real-world rainy videos demonstrate the superiority of our network in the removal of rain streaks and raindrops. Our code and results are available at https://github.com/TonyHongtaoWu/RainMamba.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: This work develops a computer vision technique to enhance the quality and performance of multimedia content by removing rain streaks, raindrops, and other interference in videos. Videos captured from outdoor systems, i.e., surveillance cameras and mobile devices, are often corrupted by both rain streaks and raindrops, which damages the visual perceptual quality and tends to degenerate the performance of subsequent outdoor computer vision and multimedia computing tasks, e.g., object detection, semantic segmentation and autonomous driving. Therefore, rain removal in this work is a crucial pre-processing step to enhance the robustness of outdoor intelligent systems and multimedia applications. Over the past years, many video-deraining papers have been published on ACM MultiMedia, which also validates the necessity of our work for the development of multimedia applications.
Supplementary Material: zip
Submission Number: 898
Loading