CoPhyBench: Benchmarking Physical Reasoning from Conditional Video Observation

CoPhyBench: Benchmarking Physical Reasoning from Conditional Video Observation

ICLR 2026 Conference Submission1668 Authors

03 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video-LLMs, Physical Video Reasoning, Benchmark

Abstract: We present \textsc{CoPhyBench}, a \textsc{Co}nditional reasoning \textsc{Phy}sics-based \textsc{Bench}mark. \textsc{CoPhyBench} evaluates the ability of Video-LLMs to reason about physical events based on conditional observations from real-world videos. It probes physics understanding from three perspectives: 1) Prediction: predicting future events from observable cues, assessing a grasp of causality in real-world scenarios. 2) Physical Calculation: estimating times and positions by translating visual conditions into variables of dynamics equations. 3) Counterfactual Reasoning: inferring futures based on hypothetical changes, to distinguish between generalizable physical understanding instead of superficial correlations. We construct a high-quality dataset consisting of 1,300 carefully verified question-answer pairs grounded in 232 diverse, real-world physics videos to support these tasks, spanning various phenomena in kinematics and dynamics. Extensive benchmarking on leading Video-LLMs reveals that while models perform reasonably on causal prediction, they struggle with precise physical calculations and counterfactual reasoning. These findings highlight the limitations of current models in transitioning from semantic alignment to deeper, physics-grounded reasoning, calling for new training paradigms to incorporate physics reasoning. Our dataset and resources will be released.

Primary Area: datasets and benchmarks

Submission Number: 1668

Loading