Fusing differentiable rendering and language-image contrastive learning for superior zero-shot point cloud classification
Abstract: Highlights•This paper addresses duplication and redundancy in multi-view images for zero-shot classification.•The model uses a differentiable rendering module to dynamically learn viewpoint parameters.•Our method leverages large language models (LLMs) to enhance 3D visual and textual alignment.•SSIM is added to the loss function to improve image distinguishability and classification accuracy.•Achieved accuracies: 75.68% (ModelNet10), 66.42% (ModelNet40), and 52.03% (ScanObjectNN).
External IDs:dblp:journals/displays/XieCWHYDN24
Loading