Designing Observation and Action Models for Efficient Reinforcement Learning with LLMs

Designing Observation and Action Models for Efficient Reinforcement Learning with LLMs

ICLR 2026 Conference Submission23501 Authors

20 Sept 2025 (modified: 26 Jan 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Observation and Action Models, Large Language Model

Abstract: The design of observation and action models is a fundamental step in reinforcement learning (RL), as it defines how agents perceive and interact with their environment. Despite its importance, this design choice is often overlooked in standard benchmarks, which typically use handcrafted models. However, this choice can substantially influence both learning efficiency and final performance. To address this gap, we propose LOAM (LLM-based design of Observation and Action Models), a framework that leverages large language models (LLMs) to automate the generation of these models. LOAM extracts information about simulation variables and task objectives from the environment and uses an LLM to generate Python functions for the observation and action models, enabling seamless integration into standard RL training pipelines. When applied to the basic locomotion tasks of HumanoidBench (stand, walk, run), LOAM-designed models achieve over 3× faster learning on average with the same FastTD3 algorithm compared to the default benchmark models. Furthermore, to handle the variability of LLM outputs, we race multiple generated designs and progressively select the top performers under a fixed training budget. To our knowledge, this is the first work to propose an efficient learning method that mitigates quality diversity in LLM-designed observation and action models within the same timestep as the standard single-model training.

Primary Area: reinforcement learning

Submission Number: 23501

Loading