VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri; Abhijay Ghildyal; Saman Zadtootaghaj; Nabajeet Barman; Cor-Paul Bezemer

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: Video Game Quality Assurance, Video Game Testing, Vision Language Models

TL;DR: A new benchmark for assessing VLM’s capabilities in real-world video game code assurance tasks.

Abstract: With video games leading in entertainment revenues, optimizing game development workflows is critical to the industry’s long-term success. Recent advances in vision-language models (VLMs) hold significant potential to automate and enhance various aspects of game development—particularly video game quality assurance (QA), which remains one of the most labor-intensive processes with limited automation. To effectively measure VLM performance in video game QA tasks and evaluate their ability to handle real-world scenarios, there is a clear need for standardized benchmarks, as current ones fall short in addressing this domain. To bridge this gap, we introduce VideoGameQA-Bench - a comprehensive benchmark designed to encompass a wide range of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack, glitch detection, and bug report generation for both images and videos.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/taesiri/VideoGameQA-Bench

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 451

Loading