Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

Glen Berseth

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

Glen Berseth

Published: 01 Jul 2025, Last Modified: 21 Jul 2025Finding the Frame (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep learning, reinforcement learning, evaluation

TL;DR: Deep reinforcement learning may have a much larger optimization problem than commonly understood. We can measure this using a practice sub-optimality gap.

Abstract: In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \textit{deaf ears} of optimization difficulties. This work proposes a new \textit{pracitcal} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments acrossenvironments and RL algorithms, it is shown that the difference between the best data generated is $2-3\times$ better than the policies' learned performance. This large difference indicates that deep RL methods only exploit half of the good experience they generate.

Submission Number: 4

Loading