MIRROR: Make Your Object-Level Multi-View Generation More Consistent with Training-Free Rectification
TL;DR: We introduce a plug-and-play method that rectifies multi-view inconsistencies in images generated by multi-view diffusion models, functioning in a training-free manner.
Abstract: Multi-view Diffusion has greatly advanced the development of 3D content creation by generating multiple images from distinct views, achieving remarkable photorealistic results. However, existing works are still vulnerable to inconsistent 3D geometric structures (commonly known as Janus Problem) and severe artifacts. In this paper, we introduce MIRROR, a versatile plug-and-play method that rectifies such inconsistencies in a training-free manner, enabling the acquisition of high-fidelity, realistic structures without compromising diversity. Our key idea focuses on tracing the motion trajectory of physical points across adjacent viewpoints, enabling rectifications based on neighboring observations of the same region. Technically, MIRROR comprises two core modules: Trajectory Tracking Module (TTM) for pixel-wise trajectory tracking that labels identical points across views, and Feature Rectification Module (FRM) for explicitly adjustment of each pixel embedding on noisy synthesized images by minimizing the distance to corresponding block features in neighboring views, thereby achieving consistent outputs. Extensive evaluations demonstrate that MIRROR can seamlessly integrate with a diverse range of off-the-shelf object-level multi-view diffusion models, significantly enhancing both the consistency and the fidelity in an efficient way.
Lay Summary: Creating realistic 3D images of objects from multiple angles is a major goal in computer graphics and AI. However, current methods often struggle to keep the shape of an object consistent when viewed from different directions. This can lead to strange visual errors, like seeing a different face from each angle, or even multiple faces appearing at the same time.
To solve this, we developed MIRROR, a simple and flexible tool that can improve the consistency of 3D-generated images without requiring any extra training. MIRROR identifies how each part of an object is represented from different angles and uses this information to fix errors and ensure consistent 3D appearance. This process helps produce 3D images that look more realistic and stable.
Our approach can be added to a variety of existing 3D image generators and makes them work better — not just in terms of visual quality, but also in how reliable and consistent the outputs are. This has promising applications in fields like design, virtual reality, and digital content creation.
Primary Area: Applications->Computer Vision
Keywords: 3D Generation, Diffusion Model, Training-Free
Submission Number: 2418
Loading