Abstract: When detecting objects, depth sensors are not always available, requiring 3D object detection from monocular images. However, for many object classes, datasets with 3D annotations are missing. Recent monocular 3D object detection methods lack the semantic diversity needed for autonomous systems, because of missing 3D ground truth data for static classes such as poles and traffic lights. To overcome this gap we combine a large scale dataset for 2D object detection, with an unlabeled dataset containing depth measurements. We lift 2D object detections of the depth dataset into the 3D domain, associating detections with corresponding depth values. This leverages 2D annotated datasets to enable semantically rich 3D object detection, without extra labelling effort. We train an object detection model with mixed batches and evaluate it comparing the predicted depth with the projected centerpoint depth of cars manually annotated in 3D space. The result is a monocular object detector that can predict 3D positions of up to 37 static and dynamic object classes from camera only.
0 Replies
Loading