Reinforcement Learning with Elastic Time Steps

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning; Elastic Time Steps; Energy efficiency; Data Efficiency; Off-Policy Optimisation.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We break the fixed time step assumption common in Reinforcement Learning, creating faster and more energy efficient policies.
Abstract: Reinforcement Learning (RL) is usually modelled as a Markov Decision Process (MDP), where an agent goes through time in discrete time steps. When applied outside of simulation, virtually all existing RL-based control systems maintain the MDP assumptions and use a constant rate control strategy, with a time step that is empirically chosen according to the specific application environment. Controlling dynamic systems with learned policies at the highest, worst-case frequency to guarantee stability can require high computational and energy resources, which can be hard to achieve with on-board hardware. Following the principles of reactive programming, we posit that applying control actions $only$ $when$ $necessary$, can allow the use of simpler hardware and reduce energy consumption. To implement this reactive policy, we break the fixed frequency assumption and propose $RL$ $with$ $elastic$ $time$ $steps$, where the policy determines the next action as well as the duration of the next time step. We also derive a Soft Elastic Actor-Critic (SEAC) algorithm to compute the optimal policy in our new setting. We demonstrate the effectiveness of SEAC both theoretically and experimentally driving an agent in a simulation of a simple world with Newtonian kinematics. Our experiments show higher average returns, shorter task completion times, and reduced energy consumption.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2750
Loading