PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models

ICLR 2026 Conference Submission13535 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Physical Computing, Large Language Models, AI for Education
TL;DR: We introduce PCEVAL, a new benchmark to evaluate LLMs' capabilities in physical computing, assessing both logical and physical circuit generation alongside code generation.
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, including software development, education, and technical assistance. Among these, software development is one of the key areas where LLMs are increasingly adopted. However, when hardware constraints are considered—for instance, in physical computing, where software must interact with and control physical hardware —their effectiveness has not been fully explored. To address this gap, we introduce PCEVAL (Physical Computing Evaluation), the first benchmark in physical computing that enables a fully automatic evaluation of the capabilities of LLM in both the logical and physical aspects of the projects, without requiring human assessment. Our evaluation framework assesses LLMs in generating circuits and producing compatible code across varying levels of project complexity. Through comprehensive testing of 13 leading models, PCEVAL provides the first reproducible and automatically validated empirical assessment of LLMs’ ability to reason about fundamental hardware implementation constraints within a simulation environment. Our findings reveal that while LLMs perform well in code generation and logical circuit design, they struggle significantly with physical breadboard layout creation, particularly in managing proper pin connections and avoiding circuit errors. PCEVAL advances our understanding of AI assistance in hardware-dependent computing environments and establishes a foundation for developing more effective tools to support physical computing education.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 13535
Loading