Pareto Value-Conditioned Networks for MORL in Stochastic Environments

Liam P.H. Mertens; Ann Nowe; Diederik M Roijers

Pareto Value-Conditioned Networks for MORL in Stochastic Environments

Liam P.H. Mertens, Ann Nowe, Diederik M Roijers

Published: 31 Oct 2025, Last Modified: 31 Oct 2025BNAIC/BeNeLearn 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Type E (Late-Breaking Abstracts)

Keywords: Multiple Objectives, Reinforcement Learning

Abstract: Many key sequential decision problems, such as climate change mitigation or epidemic mitigation, have multiple conflicting objectives. Multi-objective reinforcement learning (MORL) algorithms can handle such problems. However, many MORL algorithms, and especially value-based ones, struggle with stochastic transitions. In this paper, we propose Pareto Value Conditioned Networks (PVCN), a new method that builds on Pareto Conditioned Networks (PCN) and Pareto-optimal policy following (POPF) networks. PVCN effectively discovers Pareto-optimal policies in stochastic environments with accurate value estimates.

Submission Number: 86

Loading