Abstract: Highlights•We propose a novel highlight replay to enhance proximal policy optimization HiPPO.•We selected three key properties as the basis for highlight replaying.•Reward-constrained optimization introduced alleviates the constraint of policy similarity.•HiPPO outdoes state-of-the-art approximate policy algorithms on MuJoCo.
Loading