HFDream: Improving 3D Generation via Human-Assisted Multi-view Text-to-Image Models

June Suk Choi; Kyungmin Lee; DongJun Lee; Jinwoo Shin; Kimin Lee

HFDream: Improving 3D Generation via Human-Assisted Multi-view Text-to-Image Models

June Suk Choi, Kyungmin Lee, DongJun Lee, Jinwoo Shin, Kimin Lee

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Learning from Human Feedback, Text-to-3D generation, Diffusion Model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Large-scale text-to-image models have demonstrated the potential for performing text-to-3D synthesis. However, existing approaches, e.g., DreamFusion, suffer from unstable 3D optimization due to the limitations of current text-to-image models that they struggle to synthesize images from certain viewpoints even when specified in the text prompt. Obtaining a view-aligned image-text pair dataset is challenging due to the limited availability of such data, and the inherent subjectivity and ambiguity of view-alignment. In this paper, we propose to enhance text-to- 3D generation by learning from human feedback for generating desired views. We generate multi-view images with the text-to-image model and engage human labelers to select a valid viewpoint. Using the human-labeled dataset, we train a reward model designed to verify whether the generated image aligns with the viewpoint specified in the text prompt. Finally, we fine-tune the text-to-image model to maximize the reward score. We find that our text-to-image diffusion models fine-tuned with human feedback, coined HFDream, consistently generate diverse viewpoints without the need for multi-view datasets created from 3D assets. This leads to high-quality text-to-3D generations with consistent geometry, when combined with view-dependent prompting in DreamFusion.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7338

Loading