Abstract: As frame-level bit allocation is a dependent sequential decision-making problem, it can be modeled as a Markov Decision Process (MDP) and solved by reinforcement learning (RL). Existing reports using RL for coding mainly have two problems: unrepresentative handcrafted features and limited interaction efficiency. In this paper, to address these two problems, we first propose to use a deep neural network to extract features from the state, which is designed as a combination of raw pixel-level information and handcrafted features. The network is able to extract information necessary for the following decision-making neural network automatically. We then propose an efficient training scheme. By designing a parallel interaction algorithm and using a simplified video encoder, our proposed scheme can train on a sophisticated encoder such as VTM. To the best of our knowledge, this is the first RL algorithm implemented for bit allocation in VTM. The experimental results show that our scheme can learn interpretable frame-level bit allocation and achieves a better rate-distortion (R-D) performance compared with VTM.
Loading