Keywords: Static Friction, Off-policy Reinforcement Learning, Batch-Constrained Q-Learning
Abstract: We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy reinforcement learning, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch-constrained reinforcement learning. Our algorithm constrains the agent's action space to encourage behavior similar to that in the replay buffer, while maintaining a distance from the manifold of the orthonormal action space. The constraint preserves the simplicity of batch-constrained, and provides an intuitive physical interpretation of extrapolation error. Empirically, we further demonstrate that our algorithm is robustly trained and achieves competitive performance across standard continuous control benchmarks.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17551
Loading