Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Large Language Models, Evolutionary Search, Multimodal Diagnostics, Agentic Memory
Abstract: Large multimodal language models can guide evolutionary search toward interpretable programmatic policies, but existing systems often ask the same model call to both inspect behavioral evidence and rewrite code. This entangles visual diagnosis with repair and obscures why a mutation was chosen. We present REFLEX, a train-free evolutionary framework that decouples these roles. A vision-enabled Critic converts task-specific Behavioral Evidence into structured JSON diagnoses, while a text/code Actor uses the diagnosis and retrieved Skill Memory snippets to generate child policies. Across Lunar Lander, Acrobot, Pendulum, and compact concentric circular antenna array synthesis, existing REFLEX traces show rapid convergence: Lunar Lander reaches NWS 1.013 in 20 calls, Acrobot and Pendulum solve in under 10 calls, and the antenna task matches a compact-regime score of 25.33. REFLEX provides auditable mutation traces and reusable cross-run knowledge while preserving the transparency of generated code.
Submission Number: 302
Loading