Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks

Luise Ge; Michael Lanier; Anindya Sarkar; Bengisu Guresti; Chongjie Zhang; Yevgeniy Vorobeychik

Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks

Luise Ge, Michael Lanier, Anindya Sarkar, Bengisu Guresti, Chongjie Zhang, Yevgeniy Vorobeychik

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-task reinforcement learning, meta reinforcement learning, personalized reinforcement learning

TL;DR: We propose an approach for learning a policy committee that enables effective generalization to diverse sets of tasks in multi-task MDPs.

Abstract: Many dynamic decision problems, such as robotic control, involve a series of tasks, many of which are unknown at training time. Typical approaches for these problems, such as multi-task and meta reinforcement learning, do not generalize well when the tasks are diverse. We propose a general framework to address this issue. In our framework, the goal is to learn a set of policies—a policy committee—such that at least one is near-optimal for most tasks that may be encountered at execution time. While we show that even a special case of this problem is inapproximable, we present two effective algorithmic approaches for it. The first of these yields provably approximation guarantees, albeit in small-dimensional settings (the best we can do due to inapproximability), whereas the second is a general and practical gradient-based approach. In addition, we provide provable sample complexity bounds for few-shot learning settings. Our experiments in personalized and multi-task RL settings using MuJoCo and Meta-World benchmarks show that the proposed approach outperforms state-of-the-art multi-task, meta-, and personalized RL baselines on training and test tasks, as well as in few-shot learning, often by a large margin.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3748

Loading