FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection

Yao Li; Jiajun Deng; Yuxuan Xiao; Yingjie Wang; Xiaomeng Chu; Jianmin Ji; Yanyong Zhang

FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection

Yao Li, Jiajun Deng, Yuxuan Xiao, Yingjie Wang, Xiaomeng Chu, Jianmin Ji, Yanyong Zhang

Published: 20 Jul 2024, Last Modified: 01 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Fusing the data of millimeter-wave Radar sensors and high-definition cameras has emerged as a viable approach to achieving precise 3D object detection for roadside traffic surveillance. For roadside perception systems, earlier studies have pointed out that it is better to perform the fusion on the 2D image plane than on the BEV plane (which is popular for on-car perception systems), especially when the perception range is large (e.g., > 150𝑚). Image-plane fusion requires critical transformations, like perspective projection from the Radar’s BEV to the camera’s 2D plane and reverse IPM. However, real-world issues like uneven terrain and sensor movement degrade these transformations’ precision, impacting fusion effectiveness. To alleviate these issues, we propose a geometry-based Radar-camera fusion method on the ground, namely FARFusion V2. Specifically, we extend the ground-plane assumption in FARFusion [20] to support arbitrary shapes by formulating the ground height as an implicit representation based on geometric transformations. By incorporating the ground information, we can enhance Radar data with target height measurements. Consequently, we can thus project the enhanced Radar data onto the 2D plane to obtain more accurate depth information, thereby assisting the IPM process. A real-time parameterized transformation parameters estimation module is further introduced to refine the view transformation processes. Moreover, considering various measurement noises across these two sensors, we introduce an uncertainty-based depth fusion strategy into the 2D fusion process to maximize the probability of obtaining the optimal depth value. Extensive experiments are conducted on our collected roadside OWL benchmark, demonstrating the excellent localization capacity of FARFusion V2 in far-range scenarios. Our method achieves an average location accuracy of 0.771m when we extend the detection range up to 500m.

Primary Subject Area: [Content] Multimodal Fusion

Relevance To Conference: This work presents a multimodal fusion approach that integrates roadside millimeter-wave Radar and high-definition camera data. It aims to combine the two modalities for joint perception more effectively. This fusion approach is designed for 3D object detection in practical far-range (e.g., $> 150m$) traffic scenes. However, real-world uneven terrain and sensor movement have adverse effects on the precise view transformations in far-range scenes. To alleviate these issues, we implement three key strategies: (1) We incorporate ground information to augment the Radar data and enhance the view transformation processes. (2) We estimate transformation parameters in real time, refining the view transformation process for each frame. (3) We utilize an uncertainty-based depth fusion method to calculate more precise depth values. All these designs aim to enhance the process of multimodal fusion in the real world. Consequently, our work is highly relevant to this conference.

Supplementary Material: zip

Submission Number: 2691

Loading