Shape-Based Features Complement CLIP Features and Features Learned from Voxels in 3D Object Classification

Published: 23 Sept 2025, Last Modified: 27 Nov 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D features, shape-based features, geometric features, symmetry features, explicit vs implicit representations, 3D object classification
Abstract: Rezanejad et al. recently showed that symmetry-based contour descriptors improve convolutional neural network (CNN) performance on 2D scene categorization, indicating that complex symmetry-based features cannot necessarily be learned and/or represented with CNNs. In this work, we investigate whether there is evidence for a similar phenomenon in 3D visual data. Using 45,949 object instances from ScanNet spanning 440 classes, we evaluate ten model architectures across thirteen feature sets, including CLIP embeddings, learned features from voxel, and explicitly computed 3D descriptors: geometric statistics and symmetry-based features extracted with SymmetryNet. We observe that explicit geometric and symmetry-based descriptors consistently provide additional predictive information and improve test classification accuracy. We study the possibility of recovering symmetry-based and geometric features from CLIP embeddings, and we show that they are partially recoverable from CLIP features. Our findings extend Rezanejad et al.’s 2D results to 3D, and further demonstrate that symmetry-based and geometric features provide complementary information beyond foundation model embeddings and raw voxel representations. This provides preliminary evidence that global shape-based features may be useful in open-world 3D scene understanding.
Poster Pdf: pdf
Submission Number: 83
Loading