Twist and Compute: The Cost of Pose in 3D Generative Diffusion

Published: 31 Oct 2025, Last Modified: 28 Nov 2025EurIPS 2025 Workshop PriGMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Generative Models, Canonical View Bias, Pose Estimation
TL;DR: Hunyuan3D 2.0 has a canonical-view bias; a tiny pre-aligner network that reorients input images to a canonical pose markedly improves 3D outputs, no changes to the base model.
Abstract: Despite their impressive results, large-scale image-to-3D generative models remain opaque in their inductive biases. We identify a significant limitation in image-conditioned 3D generative models: a strong canonical view bias. Through controlled experiments using simple 2D rotations, we show that the state-of-the-art Hunyuan3D 2.0 model struggles to generalize across viewpoints, with performance degrading under rotated inputs. Surprisingly, this failure can be mitigated by a tiny, lightweight CNN that detects and corrects input orientation, restoring model performance without modifying the generative backbone. Our findings raise an important open question: Is scale enough, or should we pursue modular, symmetry-aware designs?
Submission Number: 24
Loading