Abstract: Generative Neural Radiance Fields (NeRFs) have recently enabled efficient synthesis of 3D scenes by training on unposed real image sets. However, existing methods for generating multi-view images of specific input images have limitations, such as requiring camera parameters or additional components for estimating them. In this paper, we propose ZIGNeRF, a novel learning-based approach for zero-shot 3D Generative Adversarial Network (GAN) inversion that generates multi-view images from a single input image without requiring camera parameters. Our method introduces a novel inverter that maps out-of-distribution images into the latent space of the 3D generator without needing additional training steps. We demonstrate the efficacy of ZIGNeRF on multiple real-world image datasets, including Cats, AFHQ, CelebA-HQ, CompCars, and CUB-200-2011. For example, ZIGNeRF achieves an FID of 14.77 for face image generation when trained on the CelebA-HQ dataset. Furthermore, ZIGNeRF is capable of performing 3D operations such as 360-degree rotation and spatial translations by disentangling objects from the background. It can also generate style-mixed images by combining characteristics from two distinct input images, which is a pioneering attempt in 3D-scene synthesis. Our approach opens up new possibilities for flexible and controllable 3D image generation from real-world data.
External IDs:dblp:journals/access/KoKL25
Loading