DILF: Differentiable rendering-based multi-view Image-Language Fusion for zero-shot 3D shape understanding

Published: 01 Jan 2024, Last Modified: 17 Feb 2025Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A differentiable renderer fuses explicit text guidance into rendering process to produce informative multi-view images.•We propose the group-view mechanism and LLM-assisted textual feature learning, enabling efficient text–image fusion.•It achieves state-of-the-art for zero-shot 3D classification, competitive in standard 3D classification.
Loading