Abstract: Highlights•A detailed analysis is presented that 2D convolutions and 3D point-based convolutions have different inductive biases.•A novel method named Cascaded Visual-Geometric Encoding is proposed to fuse the visual and the geometric features.•Extensive experiments demonstrate the efficacy of the proposed method.
Loading