MineCEraft: Evaluating Language Models as Construction Engineers in the World of Minecraft

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Agentic AI, Embodied NLP
Abstract: Large Language Models (LLMs) are challenging PhD-level knowledge, yet they remain underexplored in civil and, in particular, construction engineering. In this paper, we propose MineCEraft (Minecraft Construction Engineering Benchmark), an easy-to-use, open-source benchmark designed to systematically evaluate the reliability and limitations of LLMs for construction tasks in Minecraft. MineCEraft provides a safe and controllable experimental environment that enables the assessment of LLMs’ ability to perform realistic construction engineering activities. With this benchmark, we conduct an in-depth evaluation of state-of-the-art LLMs and perform a detailed error analysis, revealing key failure modes and practical challenges in applying LLMs to construction engineering tasks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 75
Loading