Population-Based Multi-Objective Reinforcement Learning with Information Sharing and Differentiation
Abstract: To efficiently tackle problems with multiple conflicting objectives, several Multi-Objective Reinforcement Learning (MORL) algorithms utilize a universal policy network that takes preference weights as input to represent optimal policies for all different preferences. However, it is quite challenging to train such a universal policy as it is easy to forget or fail to learn skills for some preferences. To alleviate this issue, we propose an efficient Population-Based MORL (PB-MORL) method that trains multiple agents with universal policy networks using a shared replay buffer. Each agent is biased towards optimizing specific objectives by applying differentiated weights to the rewards sampled from the buffer. Therefore, the policy of each agent only needs to handle the specific part of the preference space rather than the entire space, simplifying the training task. Meanwhile, the experiences in the common buffer facilitate the information sharing among individuals, which can significantly reduce the number of interaction steps for training multiple agents. Experiments on both continuous and discrete tasks demonstrate the superiority of PB-MORL over several state-of-the-art MORL methods.
External IDs:dblp:conf/ecai/WuZJLC025
Loading