A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection

Published: 11 Aug 2024, Last Modified: 20 Sept 2024ECCV 2024 W-CODA Workshop Full Paper TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Object Detection, Multimodal, Autonomous Driving
TL;DR: We propose a hybrid multimodal fusion architecture for 3D object detection, leveraging both late and cascade fusion principles, showing significant performance improvements on the KITTI benchmark..
Subject: 3D object detection and scene understanding
Confirmation: I have read and agree with the submission policies of ECCV 2024 and the W-CODA Workshop on behalf of myself and my co-authors.
Abstract: We present a new way to detect 3D objects from multimodal inputs, leveraging both LiDAR and RGB cameras in a hybrid late-cascade scheme, that combines an RGB detection network and a 3D LiDAR detector. We exploit late fusion principles to reduce LiDAR false positives, matching LiDAR detections with RGB ones by projecting the LiDAR ones on the image. We rely on cascade fusion principles to recover LiDAR false negatives leveraging epipolar constraints and multiple frustums generated by RGB detections of separate views. Our solution can be plugged on top of any underlying single-modal detectors, enabling a flexible training process that can take advantage of pre-trained LiDAR and RGB detectors, or train the two branches separately. We evaluate our results on the KITTI benchmark, showing significant performance improvements, especially for the detection of Pedestrians and Cyclists.
Submission Number: 17
Loading