Abstract: Video compression is the crucial technology of video representation, transmission, and storage. Recently, learnable video compression has received great attention from both industry and research, as a result of the potential of neural networks. More and more researchers are developing learned-based video compression frameworks and methods. Obviously, how to represent the motion information in the video is one of the essential questions across all of the end-to-end video compression methods. A robust and efficient motion representation could help the method compress the video better. In this paper, we propose the Attention-guided motion encoder (AME) to get a compact motion representation. The proposed method uses attention to guide the network in exploring the relationship between frames. Meanwhile, the proposed method leverages the multi-scale features to realize a coarse-to-fine mechanism, which will make the network compress the temporal information without losing the crucial spatial information. Furthermore, video compression not only needs better performance but also costs fewer computation resources. From the results of experiments, the proposed model outperforms existing learned and conventional video codecs on UVG, MCL-JCV, and HEVC Standard Test Sequences in PSNR and MS-SSIM, meanwhile, the proposed method also brings better improvement for each parameter.
Loading