Latent Trajectory: A New Framework for Deep Actor-Critic Reinforcement Learning with Uncertainty Quantification
Abstract: Uncertainty quantification in deep learning is challenging due to the complexity of deep neural networks. This challenge is particularly pronounced in deep reinforcement learning (RL), where agents interact with stochastic environments. In deep actor-critic RL, this challenge is further exacerbated due to the interdependence between the actor and critic updates. Existing uncertainty quantification methods for RL are predominantly developed within the Bayesian framework. While these methods estimate the uncertainty of the value function, their confidence intervals are often misleading, with the coverage rate frequently falling well below the nominal level. To address this issue, we introduce a novel deep RL framework that treats transition trajectories as latent variables. Leveraging this framework, we propose an adaptive Stochastic Gradient Markov Chain Monte Carlo algorithm to train deep actor-critic models, which naturally accounts for the interdependence between the actor and critic updates. We provide theoretical guarantees for the convergence of the proposed method and offer empirical evidence for its effectiveness in uncertainty quantification of the value function. The proposed latent trajectory framework is highly flexible, allowing for the integration of advanced RL strategies to further enhance deep actor-critic learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Related Work – We expanded the literature review to include more recent contributions on uncertainty quantification for reinforcement learning, providing clearer context for our proposed approach.
2. Clarified Definitions – Sections 2 and 3 were revised to refine and clarify key definitions, as suggested by reviewers, ensuring greater precision and readability.
3. Improved Figures – All experimental figures were revised and reformatted for improved clarity and readability.
4. Computation Cost – In the experimental section, we now explicitly discuss computational cost, including the impact of additional sampling steps on wall time and performance.
5. Sampling Step Selection – We added guidance on choosing the number of sampling steps m, supported by ablation results and practical recommendations.
6. We included a short discussion on scenarios where the proposed method outperforms the baselines.
Supplementary Material: zip
Assigned Action Editor: ~Marcello_Restelli1
Submission Number: 5154
Loading