Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR
tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR
model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field
Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From
the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution,
providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the
assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and
SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster
nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that
the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with
other advanced models.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: 1. From the perspective of information theory, we introduce a novel EARFA model for ESISR tasks. It achieves superior SR performance to most existing models and boasts faster inference than advanced Transformer-based models.
2. A new attention mechanism (i.e., EA) based on differential entropy has been crafted as a new criterion for evaluating the significance of channel-wise features. Unlike traditional attention mechanisms modeled on biological attention mechanisms in neuroscience, our EA is motivated by information theory to improve the information degree of hierarchical features via increasing the differential entropy of intermediate features.
3. We propose to augment the effective receptive field of the model with a simple yet efficient variant of LKA [27 ], which replaces the point-wise convolution with a shifting convolution. The substitution can not only eliminate the computational overhead of the point-level convolution, but also increase the feature diversity and model receptive field.
Supplementary Material: zip
Submission Number: 1734
Loading