Keywords: deep reinforcement learning, early exit neural networks, computational efficiency
TL;DR: We propose a general deep reinforcement learning method that uses early exit neural networks to speed up inference.
Abstract: The _"Bitter Lesson"_ from Richard S. Sutton emphasizes that AI methods leveraging computation tend to outperform those relying on human insight, underscoring the value of approaches that use computational resources efficiently. In deep reinforcement learning (DRL), this highlights the importance of reducing both training and inference time. While early exit neural networks, models that adapt computation to input complexity, have proven effective in supervised learning, their use in DRL remains largely unexplored. In this paper, we propose the use of Budgeted EXit Actor (BEXA), which is a novel actor-critic architecture that integrates early exit branches into the actor network. These branches are trained via the underlying DRL method and use a constrained value-based criterion to decide when to exit, allowing the policy to dynamically adjust its computation. BEXA is general, easy to tune and compatible with any off-policy actor-critic method. We evaluate BEXA using different DRL methods such as SAC and TD3 on a suite of MuJoCo tasks. Our results demonstrate a substantial improvement in inference efficiency with minimal or no loss in performance. These findings highlight early exits as a promising direction for improving computational efficiency in DRL.
Primary Area: reinforcement learning
Submission Number: 7707
Loading