Abstract: Cloud removal aims to restore high-quality images from cloud-contaminated captures, which is essential in remote sensing applications. Effectively modeling the long-range relationships between image features is key to achieving high-quality cloud-free images. While self-attention mechanisms excel at modeling long-distance relationships, their computational complexity scales quadratically with image resolution, limiting their applicability to high-resolution remote sensing images. Current cloud removal methods have mitigated this issue by restricting the global receptive field to smaller regions or adopting channel attention to model long-range relationships. However, these methods either compromise pixel-level long-range dependencies or lose spatial information, potentially leading to structural inconsistencies in restored images. In this work, we propose the focused Taylor attention (FT-Attention), which captures pixel-level long-range relationships without limiting the spatial extent of attention and achieves the $\mathcal {O}(N)$ computational complexity, where N represents the image resolution. Specifically, we utilize Taylor series expansions to reduce the computational complexity of the attention mechanism from $\mathcal {O}(N^{2})$ to $\mathcal {O}(N)$ , enabling efficient capture of pixel relationships directly in high-resolution images. Additionally, to fully leverage the informative pixel, we develop a new normalization function for the query and key, which produces more distinguishable attention weights, enhancing focus on important features. Building on FT-Attention, we design a U-net style network, termed the CR-former, specifically for cloud removal. Extensive experimental results on representative cloud removal datasets demonstrate the superior performance of our CR-former. The code is available at https://github.com/wuyang2691/CR-former.
Loading