Verifying Digital-Twin Proxy Representations for Robust Sim2Real Locomotion Transfer

Chayanin Chamachot

Verifying Digital-Twin Proxy Representations for Robust Sim2Real Locomotion Transfer

Chayanin Chamachot

Published: 26 May 2026, Last Modified: 27 May 2026Real2Sim2RealEveryoneRevisionsCC BY 4.0

Reviewer: ~Chayanin_Chamachot1

Keywords: digital twins, sim-to-real transfer, verification, robustness, transfer readiness, adaptive latent representations, domain randomization, locomotion, reinforcement learning, disentanglement

TL;DR: Six verification analyses show that factored auxiliary supervision fails to produce a decodable internal dynamics surrogate for sim2real locomotion; a tanh bottleneck—not supervision—drives observed robustness differences.

Abstract: Sim2real locomotion pipelines increasingly embed learned internal dynamics representations—compact latent states that function as adaptive surrogates of the deployment environment—inside policies for online adaptation. For such digital-twin proxy representations to support monitoring, diagnosis, and uncertainty-aware deployment, their fidelity must be directly verifiable, not merely inferred from downstream reward. We present a systematic verification protocol applying six complementary analyses (probes, interventions, Mutual Information Gap (MIG), Disentanglement–Completeness–Informativeness (DCI), Separated Attribute Predictability (SAP), mutual information) to a factored auxiliary-supervised latent (DynaMITE) on a Unitree G1 humanoid in Isaac Lab. The internal surrogate fails every fidelity check: probes yield $R^2 \approx 0$, clamping interventions produce negligible behavioral change, and standard disentanglement metrics are near zero. An unsupervised LSTM hidden state scores higher on every readout. A $2\times 2$ factorial ($n=10$) cleanly isolates the operative mechanism: a $\tanh$ information bottleneck—not the auxiliary supervision—drives observed robustness differences. A compound-mismatch transfer-readiness stress test (simultaneous friction, push, and delay perturbation, 10 seeds) reveals deployment-critical failure modes invisible to single-axis evaluation. Our results establish that a common factor-supervision assumption for internal twin representations does not survive verification, that compression architectures may provide robustness benefits independent of semantic factorization, and that sim2real pipelines relying on learned adaptive surrogates need direct fidelity checks before transfer.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

PDF: pdf

Submission Number: 19

Loading