LLM GameLab: An Interactive Platform for Testing Large Language Models in Board Games

Paulina Morillo, Alex Terreros, Cèsar Ferri, José Hernández-Orallo

Published: 01 Jan 2026, Last Modified: 25 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: While large language models are constantly evaluated in various skills, such as math, general knowledge, and coding, their ability to understand and follow game rules has not yet been deeply explored. The latter is especially important as it allows testing whether LLMs can operate within predefined limits without deviating or making illogical mistakes. Therefore, this demo paper presents a tool for interacting with LLMs in board games. The tool allows the creation of players with different large language models pitted against each other or to play in human vs. LLM mode. The platform includes rules predefined in prompts for four simple games based on Tic-Tac-Toe and Connect Four. Each player can be evaluated to account for their illegal movements, wins, draws, losses, and response times. The application also allows for the creation of new games, opening up the possibility of examining LLM behavior in situations they have not previously encountered.
Loading