Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis

Anton Plaksin; Stepan Martyanov

Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis

Anton Plaksin, Stepan Martyanov

Published: 31 Oct 2022, Last Modified: 16 Dec 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: continuous reinforcement learning, deep q-learning, optimal control problems, normalized advantage functions

TL;DR: We propose various modification of NAF algorithm for continuous reinforcement learning problems arising from optimal control problems

Abstract: One of the most effective continuous deep reinforcement learning algorithms is normalized advantage functions (NAF). The main idea of NAF consists in the approximation of the Q-function by functions quadratic with respect to the action variable. This idea allows to apply the algorithm to continuous reinforcement learning problems, but on the other hand, it brings up the question of classes of problems in which this approximation is acceptable. The presented paper describes one such class. We consider reinforcement learning problems obtained by the discretization of certain optimal control problems. Based on the idea of NAF, we present a new family of quadratic functions and prove its suitable approximation properties. Taking these properties into account, we provide several ways to improve NAF. The experimental results confirm the efficiency of our improvements.

Supplementary Material: zip

14 Replies

Loading