Grounding Natural Language Prompts in Expressive Super Mario Level Generation

Grounding Natural Language Prompts in Expressive Super Mario Level Generation

ACL ARR 2026 January Submission7819 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Natural Language Grounding, Instruction Following, Procedural Content Generation, Semantic Granularity, Structured Generation

Abstract: Natural-language-controllable procedural content generation depends critically on how linguistic concepts are grounded in structured representations. We show that widely used benchmarks rely on coarse semantic encodings that collapse distinct concepts, obscure grounding failures, and systematically inflate apparent instruction-following performance. Focusing on Super Mario level generation, we introduce MARIOPCG, a higher-fidelity dataset with expanded semantic coverage, and evaluate multiple decoder-only language models under controlled conditions. Increasing representational granularity exposes severe controllability failures in limited-capacity models that remain invisible under coarser benchmarks, while larger models exhibit stable behavior only when the representation supports meaningful grounding. These findings establish dataset semantic granularity as a necessary condition for valid evaluation of grounded language control and suggest that prior conclusions drawn from semantically collapsed benchmarks reflect representational artifacts rather than model capability. We will publicly release the dataset, prompts, and evaluation code to support reproducibility and further research.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation, Generation

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 7819

Loading