Utility-based soft masking for continual multi-objective reinforcement learning
Keywords: multi-objective reinforcement learning, continual learning
Abstract: Real-world decision-making usually involves balancing multiple and sometimes conflicting objectives, according to user preferences that can be complex and potentially non-linear. These preferences, or utilities, can also be subject to change over time, requiring continual adaptation. This challenge is at the center of continual multi-objective reinforcement learning (CMORL), and remains vastly understudied, with existing work limited to linear utilities. In this paper, we take a first step towards CMORL with non-linear utilities by proposing utility-based soft masking (UBSM). By generating a discretized representation of the utility and using it to soft-mask the policy's parameters, UBSM harnesses the structure of utility functions, allowing for greater knowledge transfer among them and supporting the learning of policies that adapt to dynamic preferences. We evaluate UBSM on classic multi-objective reinforcement learning environments, demonstrating its improvements over baselines and providing insights on the evaluation of CMORL algorithms.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1177
Loading