Improving Human Pose-Conditioned Generation: Fine-tuning ControlNet Models with Reinforcement Learning
Keywords: Generative AI, Reinforcement Learning, Text to Image Generation, Image to Image Generation, Multi-modal learning
Abstract: Advancements in diffusion-based text-to-image generation models have made it possible to create high-quality human images. However, generating humans in desired poses using text prompts alone remains challenging. Image-to-image generation methods utilizing additional image conditions can address this issue; however, they often struggle with generating images that accurately match conditioning images. This paper proposes a new fine-tuning framework for training ControlNet models with reinforcement learning by combining ControlNet and Denoising Diffusion Policy Optimization~(DDPO) to understand pose conditioning images better. We apply a novel reward function in the proposed framework for higher pose accuracy. We demonstrate that our method effectively improves human generation by enhancing pose accuracy and the correct generation of body parts without omissions or additions. In addition, we demonstrate that the effectiveness of using a more detailed pose dataset along with our proposed reward function that directly leverages keypoints, leads to improved training results.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10009
Loading