Keywords: reinforcement learning, LLM, VLM, Eureka, Reward design, reward shaping, autonomous driving, autonomous racing, Gran Turismo
TL;DR: This paper explores the use of a novel automated reward design system to achieve superhuman performance in Gran Turismo 7 using LLMs.
Abstract: When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. In this paper, we demonstrate how an LLM-based approach can be used to build an interactive system that iteratively adapts the agent’s behavior to match the designer’s wishes. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.
Submission Type: Research Paper (4-9 Pages)
Submission Number: 65
Loading