Towards LLM4PCG: A Preliminary Evaluation of Open-Weight Large Language Models Beyond ChatGPT4PCG

Published: 01 Jan 2024, Last Modified: 13 May 2025CoG 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents an initial step towards general evaluations of open-weight large language models (LLMs) using the ChatGPT4PCG platform, a Science Birds level generation challenge designed to evaluate LLMs on the complex task of generating stable, English-character-resembling, and diverse levels. While ChatGPT4PCG competitions have their own merit in providing a comprehensive platform for evaluating ChatGPT on complex tasks, the competitions focus solely on ChatGPT is rather limiting considering the fact that there are many available choices of open-weight LLMs. We report 13 LLMs from five model families of various properties in their design choices and sizes to evaluate on a modified ChatGPT4PCG 2 competition platform. We observe that the scaling law holds in general, but the inherent capabilities of LLMs due to their pre-training and architecture choices also play an equal role. We open-source the modification of the ChatGPT4PCG platform to support future research on evaluating LLMs in this area1. 1https://github.com/Pittawat2542/llm4pcg-python and https://github.com/Pittawat2542/llm4pcg-experiment
Loading