LLM Policies for Text-based Reinforcement Learning: An Interactive Tutorial

Published: 07 Aug 2024, Last Modified: 07 Aug 2024TAFM@RLC 2024EveryoneRevisionsBibTeXCC BY 4.0
Track Selection: Tutorial track.
Keywords: Large-language models, reinforcement learning, text-based games, foundation models
TL;DR: This interactive tutorial uses LLMs for text-based reinforcement learning, considering key topics such as quantization, low-rank adaptation, fine-tuning with expert demonstrations, and reinforcement learning via proximal policy optimization.
Abstract: This tutorial considers key challenges and techniques in utilizing large-language model (LLM) policies for reinforcement learning. We present an interactive notebook demonstrating how to train an agent in the Textworld environment. The tutorial covers key topics such as (1) parameterizing a policy using an LLM for text generation, (2) supervised fine-tuning with expert demonstrations, and (3) reinforcement learning with proximal policy optimization. The tutorial highlights strategies for efficient computation and memory management, including quantization and low-rank adaptation, which are crucial for scenarios with limited computational resources. This tutorial is designed for researchers familiar with reinforcement learning but with possibly limited hands-on experience in training and fine-tuning LLMs. The tutorial is available as an interactive Google Colab notebook at https://colab.research.google.com/drive/17oQqcbIJeM3EIruP4T2ju_-LNQmuqqYg?usp=sharing and can be run using a standard GPU.
Submission Number: 22
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview