LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines

Anoop Cherian; Radu Corcodel; Siddarth Jain; Diego Romeres

LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines

Anoop Cherian, Radu Corcodel, Siddarth Jain, Diego Romeres

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A method to integrate dynamical world models with LLM physical reasoning leveraging program synthesis.

Abstract: Most learning-based approaches to complex physical reasoning sidestep the crucial problem of parameter identification (e.g., mass, friction) that governs scene dynamics--despite its importance in real-world applications; e.g., collision avoidance, robotic manipulation. In this paper, we present LLMPhy, a black-box optimization framework that integrates large language models (LLMs) with physics simulators for physical reasoning. The core insight of LLMPhy is to bridge the textbook physical knowledge embedded in LLMs with the world models implemented in modern physics engines, thereby enabling the construction of digital twins of input scenes through the estimation of latent parameters. Specifically, LLMPhy decomposes digital twin construction into two subproblems: a continuous one of estimating physical parameters and a discrete one of estimating scene layout. For each subproblem, LLMPhy iteratively prompts the LLM to generate programs embedding parameter estimates, executes them in the physics engine to reconstruct the scene, and then uses the resulting reconstruction error as feedback to refine the LLM’s predictions. As existing physical reasoning benchmarks rarely account for parameter identifiability, we introduce three new datasets—including one real-world task—specifically designed to evaluate this capability in a zero-shot setting. Our results show that LLMPhy achieves state-of-the-art performance on these tasks, recovers physical parameters more accurately, and converges more reliably than popular black-box methods.

Submission Number: 1320

Loading