Learning  agents with prioritization and parameter noise in continuous state and action space

Rajesh Devaraddi; G. Srinivasaraghavan

Learning agents with prioritization and parameter noise in continuous state and action space

Rajesh Devaraddi, G. Srinivasaraghavan

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Reinforcement Learning (RL) problem can be solved in two different ways - the Value function-based approach and the policy optimization-based approach - to eventually arrive at an optimal policy for the given environment. One of the recent breakthroughs in reinforcement learning is the use of deep neural networks as function approximators to approximate the value function or q-function in a reinforcement learning scheme. This has led to results with agents automatically learning how to play games like alpha-go showing better-than-human performance. Deep Q-learning networks (DQN) and Deep Deterministic Policy Gradient (DDPG) are two such methods that have shown state-of-the-art results in recent times. Among the many variants of RL, an important class of problems is where the state and action spaces are continuous --- autonomous robots, autonomous vehicles, optimal control are all examples of such problems that can lend themselves naturally to reinforcement based algorithms, and have continuous state and action spaces. In this paper, we adapt and combine approaches such as DQN and DDPG in novel ways to outperform the earlier results for continuous state and action space problems. We believe these results are a valuable addition to the fast-growing body of results on Reinforcement Learning, more so for continuous state and action space problems.

Keywords: reinforcement learning, continuous action space, prioritization, parameter, noise, policy gradients

TL;DR: Improving the performance of an RL agent in the continuous action and state space domain by using prioritised experience replay and parameter noise.

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

4 Replies

Loading