When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subsequently, to enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised. The deterioration estimator assesses the deterioration factor of the provided masks. Then this factor is utilized in the modulation block to adaptively modulate the model's contour-following ability, which helps it dismiss the noise part in the inexplicit masks. Extensive experiments prove its effectiveness in encouraging ControlNet to interpret inaccurate spatial conditions robustly rather than blindly following the given contours, suitable for diverse kinds of conditions. We showcase application scenarios like modifying shape priors and composable shape-controllable generation. Codes are soon available.
Primary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: Our work focuses on Spatial Controllable Generation with diffusion models, i.e., Stable Diffusion and ControlNet, thus it is highly related to the topic "Generative Multimedia". We start with an in-depth analysis of ControlNet with deteriorated conditional masks and declare severe performance degradation caused by inexplicit conditions. To handle this issue, we propose a novel shape-aware ControlNet that realizes robust interpretation of inexplicit conditions. Our work provides valuable insights into the widely-used ControlNet, thus making spatial controllable generation simpler and more flexible for non-expert users.
Supplementary Material: zip
Submission Number: 1525
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview