Keywords: Engineering construction, LLM, benchmark
TL;DR: We provide BuildArena, a physics‑aligned interactive benchmark that tests the engineering construction capabilities of frontier LLMs.
Abstract: Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we introduce BuildArena, the first physics-aligned interactive benchmark designed for language-driven engineering construction. It takes a first step towards engineering automation using LLMs. Technically, it contributes to the community in two aspects: (1) an extendable task design strategy spanning static and dynamic mechanics across multiple difficulty tiers; (2) a 3D Spatial Geometric Computation Library for supporting construction based on language instructions. On eight frontier LLMs, BuildArena comprehensively evaluates their capabilities for language-driven and physics-grounded construction automation. We release the code at https://anonymous.4open.science/r/BuildArena-9B7B/ to benefit construction automation in engineering applications.
Primary Area: datasets and benchmarks
Submission Number: 2752
Loading