Abstract: Multi-objective reinforcement learning (MORL) closely mirrors real-world conditions and has consequently gained attention. However, training a MORL policy from scratch is more challenging as it needs to balance multiple objectives according to differing preferences during policy optimization. Demonstrations often embody a wealth of domain knowledge that can improve MORL training efficiency without specific design. We propose an algorithm i.e. demonstration-guided multi-objective reinforcement learning (DG-MORL), which is the first MORL algorithm that can use prior demonstrations to enhance training efficiency seamlessly. Our novel algorithm aligns prior demonstrations with latent preferences via corner weight support. We also propose a \textit{self-evolving mechanism} to gradually refine the demonstration set and avoid sub-optimal demonstration from hindering the training. DG-MORL offers a universal framework that can be utilized for any MORL algorithm. Our empirical studies demonstrate DG-MORL's superiority over state-of-the-art MORL algorithms, establishing its robustness and efficacy. We also provide the sample complexity lower bound and the upper bound of Pareto regret of the algorithm.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=2YtqF9WVt1&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We have addressed the minor issues pointed out by the AE.
We have deanonymized the camera ready version of the manuscript.
Code: https://github.com/MORL12345/DG-MORL
Supplementary Material: zip
Assigned Action Editor: ~Maxime_Gasse2
Submission Number: 3264
Loading