Keywords: Automated control policy discovery, Evolutionary computation, Multimodal large language models
TL;DR: We introduce MLES, a framework that employs multimodal large language models to analyze behavioral failures and guide evolutionary search, thereby enabling the efficient discovery of high-performance, human-readable programmatic policies.
Abstract: Deep reinforcement learning has achieved impressive success in control tasks. However, its policies, represented as opaque neural networks, are often difficult for humans to understand, verify, and debug, which undermines trust and hinders real-world deployment. This work addresses this challenge by introducing a novel approach for programmatic control policy discovery, called **M**ultimodal Large **L**anguage Model-assisted **E**volutionary **S**earch (MLES). MLES utilizes multimodal large language models as programmatic policy generators, combining them with evolutionary search to automate policy generation. It integrates visual feedback-driven behavior analysis within the policy generation process to identify failure patterns and guide targeted improvements, thereby enhancing policy discovery efficiency and producing adaptable, human-aligned policies. Experimental results demonstrate that MLES achieves performance comparable to Proximal Policy Optimization (PPO) across two standard control tasks while providing transparent control logic and traceable design processes. This approach also overcomes the limitations of predefined domain-specific languages, facilitates knowledge transfer and reuse, and is scalable across various tasks, showing promise as a new paradigm for developing transparent and verifiable control policies.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 17263
Loading