Consistent 3D Object Detection with Active LLM Reasoning

Consistent 3D Object Detection with Active LLM Reasoning

ICLR 2026 Conference Submission21740 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language model, object detection, exploration, semantic labeling, Large Vision Model

TL;DR: An iterative method that actively uses an LLM to create consistant object detection in 3D space.

Abstract: Maintaining semantic label consistency across multiple views is a persistent challenge in 3D semantic object detection. Existing zero-shot approaches that combine 2D detections with vision-language features often suffer from bias toward non-descriptive viewpoints and require a fixed label list to operate on. We propose a truly open-vocabulary algorithm that uses large language model (LLM) reasoning to relabel multi-view detections, mitigating errors from poor, ambiguous viewpoints and occlusions. Our method actively samples informative views based on feature diversity and uncertainty, generates new label hypotheses via LLM reasoning, and recomputes confidences to build a spatial-semantic representation of objects. Experiments on controlled single-object and diverse multi-object scenes show over 40\% improvement, in accuracy and sampling rate over ubiquitous fusion methods using YOLO, and CLIP. We demonstrate in multiple cases that our LLM-guided Active Detection and Reasoning (LADR) balances detail preservation with reduced ambiguity and a low sampling rate.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21740

Loading