Abstract: Deep reinforcement learning (DRL) demonstrates its promising potential in the realm of adaptive video streaming. However, existing DRL-based methods for adaptive video streaming use only application (APP) layer information and adopt heuristic training methods. This paper aims to boost the quality of experience (QoE) of adaptive wireless video streaming by using lower-layer information and deriving a rigorous training method. First, we formulate a more comprehensive and accurate adaptive wireless video streaming problem as an infinite stage discounted Markov decision process (MDP) problem by additionally incorporating past and lower-layer information, allowing a flexible tradeoff between QoE and computational and memory costs for solving the problem. Then, we propose an enhanced asynchronous advantage actor-critic (eA3C) method by jointly optimizing the parameters of parameterized policy and value function. Specifically, we build an eA3C network consisting of a policy network and a value network that can utilize cross-layer, past, and current information and jointly train the eA3C network using pre-collected samples. Finally, experimental results show that the proposed eA3C method can improve the QoE by 6.8% $\sim$ 14.4% compared to the state-of-the-arts.
Loading