Abstract: It’s generally believed that model-based reinforcement learning (RL) is more sample efficient than model-free RL. However, model-based RL methods typically suffer from model bias, which severely limits the asymptotic performance of the algorithm. Although previous model-based RL approaches use ensemble models to reduce the model error, we find that vanilla ensemble learning does not consider the model discrepancy. The discrepancy between different models is huge, which is not conducive to policy optimization. To alleviate the problem, this paper proposes an Ensemble Model Consistency Actor-Critic (EMC-AC) method to decrease the discrepancy between models while maintaining the model diversity. Specifically, we design ablation experiments to analyze the effects of the trade-off between diversity and consistency on the EMC-AC algorithm performance. Finally, extensive experiments on the continuous control benchmarks demonstrate that our approach achieves the significant performance to exceed the sample efficiency of prior model-based RL methods and to match the asymptotic performance of the state-of-the-art model-free RL algorithm.
0 Replies
Loading