Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings

Alexia Jolicoeur-Martineau

Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings

Alexia Jolicoeur-Martineau

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: video-game, llms, multi-agent, agent, animations

TL;DR: New metric for multimedia evaluation and multi-agent framework for video game generation

Abstract: Generating novel video games is a challenging problem. Large Language Models (LLMs) can generate games and animations, but lack automated evaluation metrics and struggle with complex content. To tackle these issues, we built a new metric and multi-agent system. First, we propose AVR-Eval, a metric for multimedia content where a model compares the Audio-Visual Recordings (AVRs) of two contents and determines which one is better. We show that AVR-Eval properly identifies good from broken or mismatched content. Second, we built AVR-Agent, a multi-agent system to generate JavaScript code from a bank of multimedia assets (audio, images, 3D models) and using AVR feedback. We show higher AVR-Eval with AVR-Agent than one-shot prompt. However, while humans benefit from high-quality assets and audio-visual feedback, they do not significantly increase AVR-Eval for LLMs. This reveals a gap between humans and AI content creation.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 19440

Loading