ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D representation learning, physical parameter estimation, inverse physics, sparse-view
Abstract: Neural rendering has advanced in 3D reconstruction and novel view synthesis. With the integration with physics, it opens up new applications. The inverse problem of estimating physics from visual data, however, remains challenging, limiting its effectiveness for applications like physically accurate digital twin creation in robotics and XR. Existing methods that incorporate physics into neural rendering frameworks typically require dense multi-view videos as input, making them impractical for scalable, real-world use. Given sparse multi-view videos, the sequential optimization strategy used by existing approaches introduces significant error accumulation, e.g., poor initial 3D reconstruction leads to inaccurate material parameter estimation in subsequent stages. Instead of sequential optimization, simultaneous optimization of all parameters also fails due to the highly non-convex and often non-differentiable nature of the problem. We propose ProJo4D, a progressive joint optimization framework that gradually increases the set of jointly optimized parameters, leading to fully joint optimization over geometry, appearance, physical state, and material property. Evaluations on both synthetic and real-world datasets show that ProJo4D outperforms prior work in 4D future state prediction and physical parameter estimation, demonstrating its effectiveness in physically grounded 4D scene understanding.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9394
Loading