A review of point cloud and image cross-modal fusion for self-driving

Nan Ma, Chuansheng Xiao, Mohan Wang, Genbao Xu

Published: 01 Jan 2022, Last Modified: 10 Apr 2025CIS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Environmental perception for self-driving has become a hot topic in the domain of computer vision. The field has made great progress in the past decade, and many deep learning frameworks based on single-modal data for self-driving perception tasks have emerged. However, single-modal information-based perception systems still have limitations. Recently, researchers have improved this problem using point clouds and images cross-modal fusion and obtained more satisfactory results. Therefore, it is necessary to develop a comprehensive review of recent research work. In this paper, we summarize different fusion methods, cross-modal datasets, and evaluation metrics oriented to three phases for self-driving perception tasks. Then, the point cloud and image fusion challenges are analyzed and prospected. According to the above observations, we hope that this paper can provide research ideas for self-driving point cloud and image cross-modal fusion.