Fusing differentiable rendering and language-image contrastive learning for superior zero-shot point cloud classification

Jinlong Xie, Long Cheng, Gang Wang, Min Hu, Zai Yang Yu, Minghua Du, Xin Ning

Published: 2024, Last Modified: 26 Dec 2025Displays 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•This paper addresses duplication and redundancy in multi-view images for zero-shot classification.•The model uses a differentiable rendering module to dynamically learn viewpoint parameters.•Our method leverages large language models (LLMs) to enhance 3D visual and textual alignment.•SSIM is added to the loss function to improve image distinguishability and classification accuracy.•Achieved accuracies: 75.68% (ModelNet10), 66.42% (ModelNet40), and 52.03% (ScanObjectNN).

External IDs:dblp:journals/displays/XieCWHYDN24