Graph-Structured Swin-Transformer for Learned Image Compression

Lilong Wang, Yunhui Shi, Jin Wang, Baocai Yin, Nam Ling

Published: 01 Jan 2024, Last Modified: 13 Nov 2024DCC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: End-to-end learned image compression exploits the expressive power of nonlinear transform modules to de-correlate the spatial redundancies of image contents. Due to its long-range attention scheme, transformer-based transforms can explore more global features for better reconstruction. However, transformer modules bring in indispensable computational costs, and the coarse utilization of transformer in learned image compression cannot meet the coding efficiency. In this paper, we propose a novel graph-structured swin-transformer for learned image compression, shown in Figure 1 . We assume that the global receptive field of attention map should be sparse not dense, while the local neighboring correlations must be strong.