- Keywords: Model-based reinforcement leanring, Multi-teacher knowledge distillation, Emsemble learning
- Abstract: Although reinforcement learning (RL) shines at solving decision-making problems, it not only requires collecting a large amount of environment data but also is time-consuming for training and interaction, making it hard to apply to real applications. To reduce the time-cost and improve the data efficiency, model-based reinforcement learning uses a learned system model to predict system dynamics (i.e. states or rewards) and makes a plan accordingly, thus avoiding the frequent environment interaction. Model-based methods suffer from the model-bias problem, where certain spaces of model are inaccurate, resulting in policy learning variations and system performance degradation. We propose a Multi-teacher MOdel BAsed Reinforcement Learning algorithm (MOBA), which leverages multi-teacher knowledge distillation theory to solve the model-bias problem. Specifically, different teachers search different spaces and learn various instances of a system. By distilling and transferring the teacher knowledge to a student, the student model is able to learn a generalized dynamic model that covers the state space. Moreover, to overcome the instability of multi-teacher knowledge transfer, we learn a set of student models and use an ensemble method to jointly predict system dynamics. We evaluate MOBA in high-dimensional control locomotion tasks. Results show that, compared with SOTA model-free methods, our method can improve the data efficiency and system performance by up to 75% and 10%, respectively. Moreover, our method outperforms other SOTA model-based approaches by up to 63.2% when exposed to high-range model-bias environments.