Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

Tianchi Huang, Rui-Xiao Zhang, Lifeng Sun

2022 (modified: 18 Apr 2023)IEEE Trans. Multim. 2022Readers: Everyone

Abstract: Video transmission services adopt adaptive algorithms to ensure users’ demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Zwei</i> , a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Zwei<inline-formula><tex-math notation="LaTeX">$^+$</tex-math></inline-formula></i> , which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^+$</tex-math></inline-formula> in the adaptive bitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.

0 Replies