Shap-Editor: Instruction-guided Latent 3D Editing in Seconds

Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

Published: 01 Jan 2024, Last Modified: 10 Apr 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a novel feed-forward 3D editing framework called Shap-editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks, utilizing a process called 3D distillation, which transfers knowl-edge from the 2D network to the 3D asset. Distillation ne-cessitates at least tens of minutes per asset to attain sat-isfactory editing results, thus it is not very practical. In contrast, we ask whether 3D editing can be carried out di-rectly by a feed-forward network, eschewing test-time op-timization. In particular, we hypothesise that this process can be greatly simplified by first encoding 3D objects into a suitable latent space. We validate this hypothesis by building upon the latent space of Shap-E. We demonstrate that direct 3D editing in this space is possible and efficient by learning a feed-forward editor network that only requires approximately one second per edit. Our experiments show that Shap-Editor generalises well to both in-distribution and out-of-distribution 3D assets with different prompts and achieves superior performance compared to methods that carry out test-time optimisation for each edited instance.