Keywords: Reinforcement Learning, Flow-based Model, Safe Reinforcement Learning, Prospective Thinking, Data Efficiency
TL;DR: This paper introduces a prospective-thinking model-free RL method that predicts future states and plans ahead. It further enhances safety and data efficiency through cycle-consistency constraints.
Abstract: Prospective thinking (PT) is the inherent ability of human beings, which guides the ahead-planning for decision making, becoming the key to efficient actions. However, current reinforcement learning methods lack PT in decision learning, leading to state traps caused by the lack of planning ahead, further reducing the data efficiency. This paper proposes a novel ProSpec RL method, which is the first to incorporate prospective decision learning to model-free RL for efficient and safe exploration. Specifically, to incorporate PT into model-free RL, we propose a flow-based reversible dynamics model, which predicts future n-stream trajectories based on the current state and policy.
Meanwhile, to prevent the entrapment in state traps, we propose a prospective mechanism using model predictive control with value consistency constraint, enabling the learning to plan ahead then execute, to avoid ``dead ends" caused by high-risk actions. Additionally, to improve data efficiency, we present a cyclical consistency constraint, which generates a large number of accurate and reversible virtual trajectories to further enhance state feature representations. Comprehensive evaluations of ProSpec on DMControl and Atari benchmarks demonstrate the significant accelerations in the model decision learning and the state-of-the-art performance in 4 of 6 DMControl and 7 of 26 Atari games. The code can be seen in the https://anonymous.4open.science/r/ProSpec-35B8/.
Primary Area: reinforcement learning
Submission Number: 5748
Loading