Abstract: Modeling human-object interactions is crucial for creating immersive virtual experiences. However, synthesizing 3D object dynamics conditioned on actions remains a challenging problem. Existing approaches equip static 3D objects with motion priors distilled from video diffusion models. However, this methodology has two drawbacks: (i) video diffusion models are not physically grounded. Thus, the generated videos may contain physical inaccuracies; (ii) video diffusion models cannot generate complex dynamics where multiple objects interact under actions with long durations and large spatial extent. We present $\textbf{PhysInteract}$, a physics-based framework that (i) models interactions with a representation that captures their duration and contact information; (ii) estimates object material properties (e.g., Young's modulus) from objects' deformation caused by interactions; (iii) uses physics simulation to reproduce realistic object dynamics based on estimated interactions and material properties. We highlight that PhysInteract is fully differentiable, enabling joint optimization of interaction representations and object material properties. PhysInteract achieves better performance than existing methods. We demonstrate its superiority by quantitatively testing PhysInteract on a curated dataset. In conjunction with an additional user study, our method shows a step towards more realistic and immersive virtual experiences.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ozan_Sener1
Submission Number: 6733
Loading