Keywords: video-game, llms, multi-agent, agent, animations
TL;DR: New metric for multimedia evaluation and multi-agent framework for video game generation
Abstract: Generating novel video games is a challenging problem. Large Language Models (LLMs) can generate games and animations, but lack automated evaluation metrics and struggle with complex content. To tackle these issues, we built a new metric and multi-agent system. First, we propose AVR-Eval, a metric for multimedia content where a model compares the Audio-Visual Recordings (AVRs) of two contents and determines which one is better. We show that AVR-Eval properly identifies good from broken or mismatched content. Second, we built AVR-Agent, a multi-agent system to generate JavaScript code from a bank of multimedia assets (audio, images, 3D models) and using AVR feedback. We show higher AVR-Eval with AVR-Agent than one-shot prompt. However, while humans benefit from high-quality assets and audio-visual feedback, they do not significantly increase AVR-Eval for LLMs. This reveals a gap between humans and AI content creation.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 19440
Loading