Track: Type E (Late-Breaking Abstracts)
Keywords: Multiple Objectives, Reinforcement Learning
Abstract: Many key sequential decision problems, such as climate change mitigation or epidemic mitigation, have multiple conflicting objectives. Multi-objective reinforcement learning (MORL) algorithms can handle such problems. However, many MORL algorithms, and especially value-based ones, struggle with stochastic transitions. In this paper, we propose Pareto Value Conditioned Networks (PVCN), a new method that builds on Pareto Conditioned Networks (PCN) and Pareto-optimal policy following (POPF) networks. PVCN effectively discovers Pareto-optimal policies in stochastic environments with accurate value estimates.
Submission Number: 86
Loading