Aligning Agents like Large Language Models

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: imitation learning, reinforcement learning, preference learning, alignment
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We investigate whether we can align a large imitation learning agent by following the modern paradigm for training large language models.
Abstract: Training agents to behave as desired in complex 3D environments from visual information is challenging. Imitation learning from diverse human behaviour provides a scalable mechanism for training an agent with generally sensible behaviours, but such an agent may not perform the specific behaviours of interest when deployed. To address this issue, we draw an analogy between the undesirable behaviours of imitation learning agents and the unhelpful responses of unaligned large language models (LLMs). We then investigate how the procedure for aligning LLMs can be applied to aligning agents from pixels in a complex 3D environment. For our analysis, we utilise an academically illustrative part of a modern console game in which the human behaviour distribution is diverse, but we would like our agent to imitate a single mode of this behaviour. We find that we can align our base agent to consistently perform the desired behaviour, providing a demonstration of a general approach for training agents to perform specific behaviours in complex environments.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5797
Loading