DILF: Differentiable rendering-based multi-view Image-Language Fusion for zero-shot 3D shape understanding

Xin Ning, Zai Yang Yu, Lusi Li, Weijun Li, Prayag Tiwari

Published: 2024, Last Modified: 26 Dec 2025Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A differentiable renderer fuses explicit text guidance into rendering process to produce informative multi-view images.•We propose the group-view mechanism and LLM-assisted textual feature learning, enabling efficient text–image fusion.•It achieves state-of-the-art for zero-shot 3D classification, competitive in standard 3D classification.