Calibration-Free View-Agnostic Monocular 3D Object Detection for Urban Scenes
Keywords: monocular 3D object detection, calibration-free perception, bird's-eye view, V2X cooperative perception, cross-view generalization
TL;DR: A single keypoint-based monocular 3D detector that generalizes across ego-vehicle, infrastructure, and aerial cameras without calibration, enabling plug-and-play V2X cooperative perception.
Abstract: Cooperative vehicle-to-everything (V2X) perception requires 3D object detection across heterogeneous cameras whose intrinsic parameters may be unavailable, imprecise, or drifting. We present UrbanOmniDetect, a calibration-free monocular 3D object detection framework that predicts ordered 2D projections of 3D bounding box vertices from a single RGB image. By formulating 3D detection as keypoint regression within a backbone-agnostic single-stage architecture, a single model generalizes across ego-vehicle, infrastructure, and aerial viewpoints without camera intrinsics or scene priors. We construct the UrbanOmniView dataset by unifying KITTI, DAIR-V2X, and high-fidelity Unreal Engine 5 synthetic data (4K, ray-traced) spanning ground-level, traffic-surveillance, and drone perspectives. A homography-based bird's-eye-view head maps predicted ground-contact keypoints to a top-down plane, enforcing geometric consistency without camera parameters. We experiment with YOLO11 backbone variants at multiple scales and augmented feature pyramid levels. On the KITTI benchmark, our best model achieves AP_3D = 30.71 (Moderate) and AP_BEV = 35.19 at IoU >= 0.7, outperforming calibration-dependent baselines on the Moderate and Hard splits, with an mAP_50:95 of 0.751 and 10 ms inference on an A100 GPU. Calibration-dependent baselines degrade catastrophically under small intrinsic perturbations, whereas our formulation is invariant by construction. UrbanOmniDetect provides a deployment-ready framework for autonomous driving, drone surveillance, and V2X cooperative perception.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 13
Loading