3DGS-Det: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

24 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Gaussian Splatting, 3D Object Detection, Neural Radiance Fields
Abstract: Neural Radiance Fields (NeRF) is a widely adopted class of methods for novel view synthesis. Some works have introduced it into the 3D Object Detection (3DOD) task, paving the way for promising exploration of 3D object detection based on view synthesis representation. However, NeRF has inherent limitations: (1) limited representational capacity for 3DOD as an implicit representation, and (2) slow rendering speed. Recently, 3D Gaussian Splatting (3DGS) emerged as an explicit 3D representation with faster rendering, overcoming these limitations. This paper is the first to introduce 3DGS into 3DOD and identifies two primary challenges: (a) 3DGS mainly focuses on 2D pixel-level parsing instead of 3D geometry, leading to unclear 3D spatial distribution and indistinct differentiation between objects and background, which hinders 3DOD; (b) 2D images often contain many background pixels, resulting in densely reconstructed 3DGS with noisy points representing the background, impacting detection. To address (a), we consider that 3DGS reconstruction originates from 2D images and design an elegant and efficient solution by incorporating **2D Boundary Guidance** to enhance the spatial distribution of 3DGS. Specifically, we perform boundary detection on posed images, overlay the boundaries on the images, and then train 3DGS. Interestingly, as shown in figure 1, this precise strategy significantly improves the spatial distribution of Gaussians and brings clearer differentiation between objects and background. For (b), we propose a **Box-Focused Sampling** strategy using 2D boxes to establish object probability spaces, allowing probabilistic sampling of Gaussians to retain more object points and reduce background noise. Benefiting from 2D Boundary Guidance and Box-Focused Sampling, our final method, **3DGS-DET**, achieves significant improvements (**5.6 points** on mAP0.25, **3.7 points** on mAP0.5) over the baseline version without the proposed two strategies, with introducing **zero** additional learnable parameters. Furthermore, 3DGS-DET significantly outperforms the state-of-the-art NeRF-based method, NeRF-Det, on both ScanNet and ARKITScenes. We commit to releasing all codes and data within one month of paper acceptance.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3361
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview