Fusing differentiable rendering and language-image contrastive learning for superior zero-shot point cloud classification

Published: 01 Jan 2024, Last Modified: 09 Nov 2025Displays 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•This paper addresses duplication and redundancy in multi-view images for zero-shot classification.•The model uses a differentiable rendering module to dynamically learn viewpoint parameters.•Our method leverages large language models (LLMs) to enhance 3D visual and textual alignment.•SSIM is added to the loss function to improve image distinguishability and classification accuracy.•Achieved accuracies: 75.68% (ModelNet10), 66.42% (ModelNet40), and 52.03% (ScanObjectNN).
Loading