Keywords: Lean4, Reasoning, AIforScience
TL;DR: This paper presents Lean4PHYS, a reasoning framework for college-level physics problems in Lean4. It includes LeanPhysBench, the first benchmark in the field, and PhysLib, a community-driven repository that sets the foundation for the field.
Abstract: We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. To establish a solid foundation for formal reasoning in physics, **Lean4PHYS** launches *PhysLib*, a repository containing fundamental unit systems and essential theorems to formulate physics proofs in Lean4. It will be community-driven and long-term maintained. Lean4PHYS also includes *LeanPhysBench*, a college-level benchmark for evaluating LLMs' Lean4 formal physics reasoning capability. It contains 200 hand-crafted and peer-reviewed Lean4 theorem statements formalized from university textbooks and physics competition problems. Based on the *PhysLib* and *LeanPhysBench* we composed in **Lean4PHYS**, we perform exhaustive experiments of baseline results using major expert Math provers and state-of-the-art closed-source models, and provide an analysis of their performance. In the experiment, we identify that most expert provers do not outperform general models as they did in the math domain. This suggests potential overfitting to the math domain rather than learning formal reasoning for formal provers. We also conduct a comprehensive experiment showing that, with *PhysLib* in the context, LLMs' performance on *LeanPhysBench* increases by **11.90%** on average, proving the effectiveness of our repository in assisting LLMs in solving the Lean4 physics problem. To the best of our knowledge, we are the first study to provide a physics benchmark in Lean4.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 7797
Loading