MVDGC: Joint 3D and 2D Multi-view Pedestrian Detection via Dual Geometric Constraints

TMLR Paper7670 Authors

24 Feb 2026 (modified: 22 Jun 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The core challenge in multi-view pedestrian detection (MVPD) lies in effective aggregation of visual features from different viewpoints for robust occlusion reasoning. Recent approaches have addressed this by first projecting image-view features onto a Bird's Eye View (BEV) map, where ground localization is then performed. Despite impressive performance, the perspective transformation induces severe distortion, causing spatial structure break and degrading the quality of object feature extraction. The blurred and ambiguous features hinder accurate BEV point localization, especially in densely populated regions. Moreover, the strong mutual relationship between the BEV ground point and image bounding boxes is not capitalized on. Although multi-view consistency of 2D detections can serve as a powerful constraint in BEV space, these detections are commonly treated as auxiliary signals rather than being jointly optimized with the primary task. In this work, we propose MVDGC, a unified framework that jointly estimates pedestrian locations on the BEV plane and 2D bounding boxes in image views. MVDGC employs a sparse set of 3D cylindrical queries that embraces geometric context across both BEV and image views, enforcing dual spatial constraints for precise localization. Specifically, the geometric constraints is established by modeling each pedestrian as a vertical cylinder whose center lies on the BEV plane and whose projection casts a rectangular box in the image views. These queries function as shape anchors that directly extract 2D features from the intact image-view features using camera projection, eliminating projection-induced distortions. The 3D cylindrical query enables the unification of BEV and ImV localization into a single task: 3D cylinder position and shape refinement. Extensive experiments and ablation studies demonstrate that MVDGC achieves state-of-the-art performance across multiple evaluation metrics on MVPD benchmarks, including WildTrack and MultiViewX. On the generalized multi-view detection (GMVD) dataset, MVDGC achieves the highest MODP and precision, while maintaining competitive performance on the remaining metrics, highlighting its robustness and generalization to unseen scene configurations. Code is available at: \url{https://github.com/UARK-AICV/MVDGC}
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=2gnY6nW3M6
Changes Since Last Submission: - Abstraction modification following Reviewer twWV (Weakness #1 and Requested Change #1). - Tracking intention clarification in Section 4.5.2 following Reviewer m7Zm, twWV, PaA7. - Qualitative comparison detailed walk-through in Section 4.6 following Reviewer m7Zm. - Ablation Study's tables information addition and reorganization in Section 4.7 following Reviewer twWV. - Acknowledgement Addition.
Code: https://github.com/UARK-AICV/MVDGC
Assigned Action Editor: ~Lei_Wang13
Submission Number: 7670
Loading