Yongxi Tan, Jin Yang, Xin Chen, Qitao Song, Yunjun Chen, Zhangxiang Ye, Zhenqiang Su

Feb 12, 2018 (modified: Feb 12, 2018) ICLR 2018 Workshop Submission readers: everyone
  • Abstract: Mobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Actually, learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, it still remains an unsolved problem for deep reinforcement learning (DRL), given incomplete/imperfect information in real world, huge state/action space, lots of data needed for training, associated time/cost, interactions among multi-agents, potential negative impact to real world, etc. To bridge this reality gap, we proposed a DRL framework to direct transfer optimal policy learned from multi-tasks in source domain to unseen similar tasks in target domain without any further training in both domains. First, we distilled temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we used a novel self-play mechanism to empower DRL agent to gradually improve its intelligence by competing for best record on multiple tasks. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks and 6 field trials over 4 commercial mobile networks in real world, we demonstrated the capability of our approach to direct transfer the learning from one simulator to another simulator, and from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with incomplete and imperfect information, huge state/action space and multi-agent interactions.
  • TL;DR: Sim-to-real optimization of complex real world mobile network with imperfect information via multi-agent multi-task deep reinforcement learning from self-play
  • Keywords: deep reinforcement learning, deep learning, transfer learning, reality gap, artificial intelligence, mobile network, optimization, real world, sim-to-real, DRL, neural network, multi-agent, multi-task, imperfect information, simulation, optimization, CCO, coverage and capacity optimization, self play, competitive, cooperative, competition, cooperation, coordination, self organizing network, network optimization