PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION

PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION

ICLR 2026 Conference Submission22410 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Object Detection, Point Clouds, Two-stage, Transformer

Abstract: 3D object detection from LiDAR point cloud data is important for autonomous driving systems. Recent two-stage 3D object detectors struggle to achieve satisfactory performance due to limitations in proposal quality, stemming from the degradation of geometric detail information in the generated proposal features caused by high sparsity and uneven distribution of point clouds, as well as a lack of effective exploitation of surrounding contextual cues in the independent proposal refinement stage. To this end, we propose a Proposal-centric Transformer Network (PTN), which includes a Hierarchical Attentive Feature Alignment (HAFA) module and a Collaborative Proposal Refinement Module (CPRM). More concretely, to obtain multi-granularity proposal representations, HAFA employs a dual-stream architecture that extracts both coarse-grained voxel features and fine-grained point features to enhance proposal features, then harmo- nizes them through a feature alignment network in a unified space. The CPRM first generates object queries for all objects and then establishes contextual-aware interactions to extract complementary information from semantically similar and spatially relevant proposals. PTN achieves promising performance on large-scale Waymo and KITTI benchmark, demonstrating the superiority of PTN.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22410

Loading