Sample-Efficient Policy Learning based on Completely Behavior Cloning

Qiming Zou, Ling Wang, Yu Li, Jie Liu

Published: 2019, Last Modified: 06 Feb 2025SMC 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In order to overcome these challenges, this paper proposed a training-free behavior cloning algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC transforms the Model Predictive Control (MPC) controller into a PieceWise Affine (PWA) function with multi-parametric programming, and uses a neural network to express this function. By this way, off-the-shelf deep reinforcement learning algorithms can be used to fine-tune this neural network. The experiments show that our method can help agent learn at the high reward state region, and converge faster and better.