Abstract: Transformer is leading a trend in the field of image processing. While existing lightweight image processing transformers have achieved notable success, they primarily focus on reducing FLOPs (floating-point operations) or the number of parameters, rather than on practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup. Specifically, we propose a novel non-volatile sparse masking self-attention (NVSM-SA) that utilizes a pre-computing sparse mask to capture contextual information from a larger window with no extra computation overload. Besides, a high-frequency reparameterization module (HRM) is proposed to make LIPT block reparameterization friendly, enhancing the model’s ability to reconstruct fine details. Extensive experiments on multiple image processing tasks (e.g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR. LIPT achieves real-time GPU inference with state-of-the-art performance on multiple image SR benchmarks. The source codes are released at https://github.com/Lucien66/LIPT
External IDs:doi:10.1109/tip.2025.3567832
Loading