Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies

Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies

ICLR 2026 Conference Submission17263 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automated control policy discovery, Evolutionary computation, Multimodal large language models

TL;DR: We introduce MLES, a framework that employs multimodal large language models to analyze behavioral failures and guide evolutionary search, thereby enabling the efficient discovery of high-performance, human-readable programmatic policies.

Abstract: Deep reinforcement learning has achieved impressive success in control tasks. However, its policies, represented as opaque neural networks, are often difficult for humans to understand, verify, and debug, which undermines trust and hinders real-world deployment. This work addresses this challenge by introducing a novel approach for programmatic control policy discovery, called **M**ultimodal Large **L**anguage Model-assisted **E**volutionary **S**earch (MLES). MLES utilizes multimodal large language models as programmatic policy generators, combining them with evolutionary search to automate policy generation. It integrates visual feedback-driven behavior analysis within the policy generation process to identify failure patterns and guide targeted improvements, thereby enhancing policy discovery efficiency and producing adaptable, human-aligned policies. Experimental results demonstrate that MLES achieves performance comparable to Proximal Policy Optimization (PPO) across two standard control tasks while providing transparent control logic and traceable design processes. This approach also overcomes the limitations of predefined domain-specific languages, facilitates knowledge transfer and reuse, and is scalable across various tasks, showing promise as a new paradigm for developing transparent and verifiable control policies.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 17263

Loading