Abstract: Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and by using a Lagrangian relaxation we decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length. We then derive a natural policy gradient (NPG) based constrained reinforcement learning (CRL) algorithm using the structure of our problem, and show that it converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We demonstrate using youtube media streaming experiments that our policy can increase the user quality of experience by over 30%. Furthermore, we show that the structured learning is fast, and can be easily deployed, taking only about 15μs to execute.
Loading