Abstract: Highlights•This paper proposes a novel two-stage end-to-end differentiable architecture for the 3D object detection in point clouds, which is dubbed as Objformer.•Equipped with the specially designed instance feature encoder, Objformer can extract clean instance feature and significant geometric prior of the target.•By encoding the pseudo category label from the 3D proposals into the semantic feature of instance, Objformer can boost the information complementarity of across objects with the instance interaction module.•Proposed Objformer achieves state-of-the-art 3D object detection performance on SUN RGB-D and ScanNet. The significant performance gains on both benchmarks and the improvement over the multi-modal method indicate the superiority of Objformer.
Loading