Predicting Multiple Actions for Stochastic Continuous Control

Sanjeev Kumar; Christian Rupprecht; Federico Tombari; Gregory D. Hager

Predicting Multiple Actions for Stochastic Continuous Control

Sanjeev Kumar, Christian Rupprecht, Federico Tombari, Gregory D. Hager

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: We introduce a new approach to estimate continuous actions using actor-critic algorithms for reinforcement learning problems. Policy gradient methods usually predict one continuous action estimate or parameters of a presumed distribution (most commonly Gaussian) for any given state which might not be optimal as it may not capture the complete description of the target distribution. Our approach instead predicts M actions with the policy network (actor) and then uniformly sample one action during training as well as testing at each state. This allows the agent to learn a simple stochastic policy that has an easy to compute expected return. In all experiments, this facilitates better exploration of the state space during training and converges to a better policy.

TL;DR: We introduce a novel reinforcement learning algorithm, that predicts multiple actions and samples from them.

Keywords: Reinforcement Learning, DDPG, Multiple Action Prediction

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco), [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym), [TORCS](https://paperswithcode.com/dataset/torcs)

9 Replies

Loading