Abstract: Highlights•A graph offset-based encoder, GoConv, is proposed to enhance the local structure capture of point clouds.•Introducing a joint point-view pretraining strategy to learn a unified representation of two modalities.•Assign point cloud and view information to the same feature space through cross-modal alignment.•Validation of the proposed method in standard and zero-shot classification tasks.•Excellent performance of the proposed method in 3D classification tasks without textual information.
External IDs:dblp:journals/pr/ZhangYWTYLN25
Loading