TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Recently, there has been significant progress in leveraging human feedback to enhance image generation, leading to the emergence of a rapidly evolving research area. However, current work faces several critical challenges: i) insufficient data quantity; and ii) rough feedback learning; To tackle these challenges, we present TreeReward, a novel multi-dimensional, fine-grained, and adaptive feedback learning framework that aims to improve both the semantic and aesthetic aspects of diffusion models. Specifically, To address the limitation of the fine-grained feedback data, we first design an efficient feedback data construction pipeline in an "AI + Expert" fashion, yielding about 2.2M high-quality feedback dataset encompassing six fine-grained dimensions. Built upon this, we introduce a tree-structure reward model to exploit the fine-grained feedback data efficiently and provide tailored optimization during feedback learning. Extensive experiments on both Stable Diffusion v1.5 (SD1.5) and Stable Diffusion XL (SDXL) demonstrate the effectiveness of our method in enhancing the general and fine-grained generation performance and the generalizability of downstream tasks.

Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Generation] Generative Multimedia, [Experience] Multimedia Applications
Relevance To Conference: Our proposed model is effective across fine-grained quality and generalization task proficiency, establishing its superiority over existing SD1.5 and SDXL text-to-image generation architecture models.
Supplementary Material: zip
Submission Number: 1238
Loading