Circular-DPO: Aligning Multi-Stage 3D Generative Models via Preference Feedback Loop

Zejian Li; Jiarui Ma; Han Xu; Weiting Zheng; Chenye Meng; Shengyuan Zhang; Changyuan Yang; Zhiyuan Yang; Guang Yang

Circular-DPO: Aligning Multi-Stage 3D Generative Models via Preference Feedback Loop

Zejian Li, Jiarui Ma, Han Xu, Weiting Zheng, Chenye Meng, Shengyuan Zhang, Changyuan Yang, Zhiyuan Yang, Guang Yang

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3d generation

Abstract: Multi-stage generative models have shown great promise in 3D content creation due to focused generation of structure or texture in different stages, but their outputs often fail to align with human preferences.To apply The key bottleneck to apply alignment methods is the presence of non-differentiable operations between generative stages. This disconnection stops preference signals applied to the final output to be backpropagated to the crucial, early stages of generation, while simple separated stage-wise alignment leads to texture-geometry inconsistency. To address this challenge, we introduce Circular-DPO, which builds a preference feedback loop to align multi-stage 3D generation models to human preference. Our method first applies Direct Preference Optimization (DPO) to refine the final 3D asset. We then construct new preference pairs by sampling and decoding the assets generated by the optimized model. These newly-formed pairs are used to train the preceding generative stage, effectively creating a feedback loop that bridges the non-differentiable gap. Furthermore, to enhance robustness against noisy data, we introduce a quality-aware weighting mechanism that prioritizes reliable preference pairs during training. Experiments demonstrate that our approach improves the alignment of generated 3D content with human preferences by enabling holistic, multi-stage optimization.

Primary Area: generative models

Submission Number: 10805

Loading