Abstract: Traditional video codecs follow the predictive coding architecture of motion-compensated prediction and residual transform coding. Inspired by recent advances in deep learning, we propose a new deep learning video compression architecture that does not require motion estimation, which is the most expensive component in traditional video codecs. Our network consists of three components: a Displacement Calculation Unit (DCU), a Displacement Compression Network (DCN), and a Frame Reconstruction Network (FRN). The DCU exploits displaced frame differences as motion information, thus removing the need for motion estimation found in hybrid codecs. DCN utilizes an RNN-based network to learn temporal dependencies between frames. In the FRN, a new version of the UNet model, called LSTM-UNet is proposed and utilized to learn space-time differential representations of the videos. Our experimental results show that our compression model, MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion and outperforms the video coding standard H.264 and exceeds the performance of the modern global standard HEVC codec as measured by MS-SSIM, especially on higher resolution videos.
0 Replies
Loading