Abstract: Recently, Transformers have shown impressive performance in image super-resolution (SR), due to exploiting strong representation ability of multi-head self-attention (MSA). However, existing methods typically calculate MSA in a single range and granularity, preventing the model from capturing sufficient relationships between pixels, thus leading to inferior representation ability. To address this issue, we propose Multi-range and Mix-grained Transformer (M2TSR) for accurate image SR. In particular, we develop Multi-range and Mix-grained Transformer Block (M2TB) that construct diverse MSA to extract distinct relationships under various ranges and granularities. The short-range MSAS focus on extracting local relationships. Then, the MSAL expands ranges to capture the complex relationships inherent in long-range pixels. Meanwhile, fine and coarse-grained features are employed to model distinct relationships at various granularities. Extensive experiments demonstrate the superiority of M2TSR over the SOTA methods. The lightweight variant, M2TSRS, also achieves a better trade-off between performance and computation cost against advanced lightweight methods.
Loading