Abstract: End-to-end learned image compression exploits the expressive power of nonlinear transform modules to de-correlate the spatial redundancies of image contents. Due to its long-range attention scheme, transformer-based transforms can explore more global features for better reconstruction. However, transformer modules bring in indispensable computational costs, and the coarse utilization of transformer in learned image compression cannot meet the coding efficiency. In this paper, we propose a novel graph-structured swin-transformer for learned image compression, shown in Figure 1 . We assume that the global receptive field of attention map should be sparse not dense, while the local neighboring correlations must be strong.
Loading