TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning

Jiacheng Zhang; Jie Wu; Huafeng Kuang; Haiming Zhang; Yuxi Ren; Weifeng Chen; Manlin Zhang; Xuefeng Xiao; Rui Wang; Shilei Wen; Guanbin Li

TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning

Jiacheng Zhang, Jie Wu, Huafeng Kuang, Haiming Zhang, Yuxi Ren, Weifeng Chen, Manlin Zhang, Xuefeng Xiao, Rui Wang, Shilei Wen, Guanbin Li

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, there has been significant progress in leveraging human feedback to enhance image generation, leading to the emergence of a rapidly evolving research area. However, current work faces several critical challenges: i) insufficient data quantity; and ii) rough feedback learning; To tackle these challenges, we present TreeReward, a novel multi-dimensional, fine-grained, and adaptive feedback learning framework that aims to improve both the semantic and aesthetic aspects of diffusion models. Specifically, To address the limitation of the fine-grained feedback data, we first design an efficient feedback data construction pipeline in an "AI + Expert" fashion, yielding about 2.2M high-quality feedback dataset encompassing six fine-grained dimensions. Built upon this, we introduce a tree-structure reward model to exploit the fine-grained feedback data efficiently and provide tailored optimization during feedback learning. Extensive experiments on both Stable Diffusion v1.5 (SD1.5) and Stable Diffusion XL (SDXL) demonstrate the effectiveness of our method in enhancing the general and fine-grained generation performance and the generalizability of downstream tasks.

Primary Subject Area: [Content] Vision and Language

Secondary Subject Area: [Generation] Generative Multimedia, [Experience] Multimedia Applications

Relevance To Conference: Our proposed model is effective across fine-grained quality and generalization task proficiency, establishing its superiority over existing SD1.5 and SDXL text-to-image generation architecture models.

Supplementary Material: zip

Submission Number: 1238

Loading