Pareto Value-Conditioned Networks for MORL in Stochastic Environments

Published: 31 Oct 2025, Last Modified: 31 Oct 2025BNAIC/BeNeLearn 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Type E (Late-Breaking Abstracts)
Keywords: Multiple Objectives, Reinforcement Learning
Abstract: Many key sequential decision problems, such as climate change mitigation or epidemic mitigation, have multiple conflicting objectives. Multi-objective reinforcement learning (MORL) algorithms can handle such problems. However, many MORL algorithms, and especially value-based ones, struggle with stochastic transitions. In this paper, we propose Pareto Value Conditioned Networks (PVCN), a new method that builds on Pareto Conditioned Networks (PCN) and Pareto-optimal policy following (POPF) networks. PVCN effectively discovers Pareto-optimal policies in stochastic environments with accurate value estimates.
Submission Number: 86
Loading