HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physical structure and pose of hands from training images, which involves extensive deformations and occlusions. For correct hand generation, our paper introduces a lightweight post-processing solution called $\textbf{HandRefiner}$. HandRefiner employs a conditional inpainting approach to rectify malformed hands while leaving other parts of the image untouched. We leverage the hand mesh reconstruction model that consistently adheres to the correct number of fingers and hand shape, while also being capable of fitting the desired hand pose in the generated image. Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information. Additionally, we uncover a phase transition phenomenon within ControlNet as we vary the control strength. It enables us to take advantage of more readily available synthetic data without suffering from the domain gap between realistic and synthetic hands. Experiments demonstrate that HandRefiner can significantly improve the generation quality quantitatively and qualitatively. The code will be released.
Relevance To Conference: This work presents an innovative approach to refine the realism and accuracy of human hands generated by text-to-image models, which is a key aspect of generative multimedia. Malformed human hands in generated images have severely impacted the overall perception and effectiveness of generative multimedia communication. Motivated by such challenge, this work contributes to the broader field of multimedia processing by improving the quality of visual content, especially in the depiction of human hands. It pushes forward the capabilities of generative multimedia systems to process and present visual information in a more refined and realistic manner, enhancing user experience across a wide range of applications.
Supplementary Material: zip
Primary Subject Area: [Generation] Generative Multimedia
Submission Number: 1526