ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model

Jiahui Chen, Chuanfeng Yang, Kaiheng Li, Qingqiang Wu, Qingqi Hong

Published: 01 Jan 2024, Last Modified: 15 May 2025ICANN (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D scene stylization aims to generate artistically rendered images from various viewpoints within a 3D space while ensuring style consistency regardless of the viewing angle. Traditional 2D methods usually used in this field struggle with maintaining this consistency when applied to 3D environments. To address this issue, we propose a novel approach named ControlNeRF, which employs a customized conditional diffusion model, ControlNet, and introduces latent variables, obtaining a stylized appearance throughout the scene solely driven by text. Specifically, this text-driven approach effectively overcomes the inconveniences associated with using images as style cues, and it not only achieves a high degree of stylistic consistency across various viewpoints but also produces high-quality images. We have conducted rigorous testing on ControlNeRF with diverse styles, which has confirmed these outcomes. Our approach not only advances the field of 3D scene stylization but also opens new possibilities for artistic expression and digital imaging.