AVA: Attentive VLM Agent for Mastering StarCraft II

AVA: Attentive VLM Agent for Mastering StarCraft II

ACL ARR 2026 January Submission1475 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: StarCraftII, VLM Agent, Benchmark

Abstract: We introduce **AVACraft** — the first multimodal benchmark environment for complex decision-making in StarCraft II, supporting both traditional Multi-Agent Reinforcement Learning (MARL) and modern Vision-Language Model (VLM) paradigms. Existing StarCraft II environments like SMAC rely on abstract state representations that deviate from human perception and lack support for emerging VLM-based decision-making. AVACraft mitigates these limitations via a unified framework, which provides RGB visual inputs, natural language observations and structured state information, enabling systematic comparisons between training-based and zero-shot decision-making methods. Our benchmark features 21 carefully designed scenarios covering micromanagement, coordination and strategic planning, with standardized evaluation protocols for both paradigms. We establish comprehensive baselines using four MARL algorithms (IQL, QMIX, QTRAN, VDN) and multiple state-of-the-art VLMs (GPT-4o, Qwen-VL, etc.). Experimental results reveal their complementary strengths: MARL methods achieve up to 27.1\% win rate after 1M training steps in complex scenarios, while VLMs deliver superior zero-shot performance (75–81\% win rate) and human-aligned decision processes without any training. Systematic analysis (including expert human evaluation) also identifies key trade-offs between training efficiency, performance ceilings and interpretability across the two paradigms. Our implementation is available at https://anonymous.4open.science/r/VLM-Play-StarCraft2-70C4 .

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Language Modeling,Multimodality

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 1475

Loading