Abstract: Highlights•We propose a feed-forward encoder that directly transforms multi-view images of an object into a semantic volumetric neural representation.•We propose new noise schedules and the low-frequency noise techniques to effectively train diffusion models on the feature volumes.•We conduct extensive experiments and demonstrate the excellent generation quality and efficient inference capabilities of our method.
External IDs:dblp:journals/cvgip/TangGWZBCG25
Loading