VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: Video Game Quality Assurance, Video Game Testing, Vision Language Models
TL;DR: A new benchmark for assessing VLM’s capabilities in real-world video game code assurance tasks.
Abstract: With video games leading in entertainment revenues, optimizing game development workflows is critical to the industry’s long-term success. Recent advances in vision-language models (VLMs) hold significant potential to automate and enhance various aspects of game development—particularly video game quality assurance (QA), which remains one of the most labor-intensive processes with limited automation. To effectively measure VLM performance in video game QA tasks and evaluate their ability to handle real-world scenarios, there is a clear need for standardized benchmarks, as current ones fall short in addressing this domain. To bridge this gap, we introduce VideoGameQA-Bench - a comprehensive benchmark designed to encompass a wide range of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack, glitch detection, and bug report generation for both images and videos.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/taesiri/VideoGameQA-Bench
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 451
Loading