Evolutionary Reward Design and Optimization with Multimodal Large Language Models

Published: 22 Apr 2024, Last Modified: 04 May 2024VLADR 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reward Design, Multimodal Large Language Models
Abstract: Designing effective reward functions is a pivotal yet challenging task for Reinforcement Learning (RL) practices, often demanding domain expertise and substantial effort. Recent studies have explored the utilization of Large Language Models (LLMs) to generate reward functions via evolutionary search techniques. However, these approaches overlook the potential of multimodal information, such as images and videos. In particular, prior methods predominantly rely on numerical feedback from the RL environment for doing evolution, neglecting the incorporation of visual data that could be obtained during training. This study introduces a novel approach by employing Multimodal Large Language Models (MLLMs) to craft reward functions tailored for various RL tasks. The methodology involves providing MLLM with the RL environment’s code alongside its image as context and task information to generate reward candidates. Then, the chosen agent undergoes training, and the numerical feedback from the environment, along with the recorded video of the top-performing policy, is provided as feedback to the MLLM. By employing an iterative feedback mechanism through evolutionary search, MLLM consistently refines the reward function to maximize accuracy. Testing on two different agents across two distinct tasks points to the preeminence of our approach over previous methodology, which themselves outperformed 83% of reward functions designed by human experts.
Supplementary Material: zip
Submission Number: 21
Loading