Learning to Explore with In-Context Policy for Fast Peer Adaptation

Long Ma; Yuanfei Wang; Fangwei Zhong; Song-Chun Zhu; Yizhou Wang

Learning to Explore with In-Context Policy for Fast Peer Adaptation

Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Multi-agent Reinforcement Learning, In-Context Learning, Peer Adaptation

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: a method that learns to actively explore and adapt to the policies of unknown peers in multi-agent settings.

Abstract: Adapting to different peers in multi-agent settings requires agents to quickly learn about the peer’s policy from a few interactions and act accordingly. In this paper, we present a novel end-to-end method that learns an in-context policy that actively explores the peer’s policy, recognizes its pattern, and adapts to it. The agent is trained on a diverse set of peer policies to learn how to balance exploration and exploitation based on the observed context, which is the history of interactions with the peer. The agent proposes exploratory actions when the context is uncertain, which can elicit informative feedback from the peer and help infer its preferences. To encourage such exploration behavior, we introduce an intrinsic reward based on the accuracy of the peer identification. The agent exploits the context when it is confident, which can optimize its performance with the peer. We evaluate our method on two tasks that involve competitive (Kuhn Poker) or cooperative (Overcooked) interactions with peer agents. We demonstrate that our method induces active exploration behavior, achieving faster adaptation and better outcomes than existing methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7102

Loading