MVDGC: Joint 3D and 2D Multi-view Pedestrian Detection via Dual Geometric Constraints

TMLR Paper7670 Authors

24 Feb 2026 (modified: 28 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The core challenge in multi-view pedestrian detection (MVPD) lies in effective aggregation of visual features from different viewpoints for robust occlusion reasoning. Recent approaches have addressed this by first projecting image-view features onto a Bird's Eye View (BEV) map, where ground localization is then performed. Despite impressive performance, the perspective transformation induces severe distortion, causing spatial structure break and degrading the quality of object feature extraction. The blurred and ambiguous features hinder accurate BEV point localization, especially in densely populated regions. Moreover, the strong mutual relationship between the BEV ground point and image bounding boxes is not capitalized on. Although multi-view consistency of 2D detections can serve as a powerful constraint in BEV space, these detections are commonly treated as auxiliary signals rather than being jointly optimized with the primary task. In this work, we propose MVDGC, a unified framework that jointly estimates pedestrian locations on the BEV plane and 2D bounding boxes in image views. MVDGC employs a sparse set of 3D cylindrical queries that embraces geometric context across both BEV and image views, enforcing dual spatial constraints for precise localization. Specifically, the geometric constraints is established by modeling each pedestrian as a vertical cylinder whose center lies on the BEV plane and whose projection casts a rectangular box in the image views. These queries function as shape anchors that directly extract 2D features from the intact image-view features using camera projection, eliminating projection-induced distortions. The 3D cylindrical query enables the unification of BEV and ImV localization into a single task: 3D cylinder position and shape refinement. Extensive experiments and ablation studies demonstrate that MVDGC achieves state-of-the-art performance across multiple evaluation metrics on MVPD benchmarks, including Wildtrack and MultiviewX, as well as on the generalized multi-view detection (GMVD) dataset. Moreover, by explicitly modeling BEV-ImV coherency through cylindrical queries, MVDGC not only delivers high precision in multi-view detection but also surpasses image-based tracking methods in a single-view scenario. Code is available upon acceptance.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=2gnY6nW3M6
Changes Since Last Submission: - The previous submission is desk-rejected due to the presence of funding statements. Accordingly, we removed the statements. - We added a new figure for the framework pipeline and explanation.
Assigned Action Editor: ~Lei_Wang13
Submission Number: 7670
Loading