A Real2Sim Digital Twin Pipeline for Photorealistic Robot Simulation: Evaluating VLA Policy Deployment on a Bimanual Mobile Robot
Reviewer: ~Wei_Xu4
Keywords: Generative Digital Twins, Closed-Loop Policy Evaluation, Embodied AI, 3D Gaussian Splatting, Real2Sim
Abstract: Digital twins that are automatically constructed from robot sensor data offer a promising pathway for scalable Real2Sim and Sim2Real transfer. However, it remains an open question whether photorealistic reconstruction alone is sufficient to support reliable deployment of vision-language-action (VLA) policies.
We present a generative-AI-assisted Real2Sim pipeline that generates simulation-ready digital twins from real-world RGB observations with minimal manual intervention. The pipeline uses prompted segmentation to isolate scene components and a generative 3D model to directly produce simulation assets, eliminating the need for traditional multi-view reconstruction or manual 3D modeling.
To evaluate simulation fidelity, we deploy and compare policies from two VLA models in both the real robot and the reconstructed simulation under identical tasks and initial conditions. We compare joint-level action trajectories and analyze how divergence evolves over time in closed-loop execution.
Although the reconstructed environments are visually accurate, we observe increasing trajectory divergence during closed-loop operation. These results indicate that photorealistic reconstruction alone is insufficient to preserve closed-loop control behavior in VLA policies, particularly in contact-rich manipulation settings where small perceptual errors compound over time.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
PDF: pdf
Submission Number: 27
Loading