TwoSquared: 4D Generation from 2D Image Pairs

Published: 05 Nov 2025, Last Modified: 30 Jan 20263DV 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 4D, deformation, 4D generation, shape analysis
TL;DR: 4D generation from image pair input
Abstract: Recovering a 4D motion from sparse visual information (such as two temporal frames of a subject) is a significant challenge. While humans are able to hallucinate the missing information in a plausible way, generative AI struggles due to a lack of high-quality training data and heavy computing requirements. To overcome these limitations, we propose TwoSquared, a method that obtains a 4D plausible sequence from just two 2D RGB images corresponding to the beginning and the end of the action. We propose to decompose and solve the problem in two steps: 1) first, obtaining a 3D reconstruction of the initial and final status, and 2) model the intermediate sequence as a physically plausible deformation. Our method does not require templates or class-specific prior knowledge, and can operate with arbitrary in-the-wild examples. We demonstrate our capabilities in a number of different objects, diverse in terms of nature, class, and deformation, surpassing video-based alternatives, which cannot achieve the same level of consistency.
Supplementary Material: pdf
Submission Number: 244
Loading