Continuous Control on Time

Tianwei Ni; Eric Jang

Continuous Control on Time

Tianwei Ni, Eric Jang

Published: 27 Apr 2022, Last Modified: 05 May 2023ICLR 2022 GPL PosterReaders: Everyone

Keywords: continuous-time RL, time discretization, constrained MDP

TL;DR: We augment the action space with a dimension of continuous time for continuous environments.

Abstract: The physical world evolves continuously in time. Most prior works on reinforcement learning cast continuous-time environments into a discrete-time Markov Decision Process (MDP), by discretizing time into constant-width decision intervals. In this work, we propose Continuous-Time-Controlled MDPs (CTC-MDP), a continuous-time decision process that permits the agent to decide how long each action will last in the physical time of the environment. However, reinforcement learning in vanilla CTC-MDP may result in agents learning to take infinitesimally small time scales for each action. To prevent such degeneration and allow users to control the computation budget, we further propose CTC-MDPs with a constraint on the average time scale over a given threshold. We hypothesize that constrained CTC-MDPs will allow agents to "budget" fine-grained time scales to states where it may need to adjust actions quickly, and coarse-grained time scales to states where it can get away with a single decision. We evaluate our new CTC-MDP framework (with and without constraint) on the standard MuJoCo benchmark.

1 Reply

Loading