Exploiting Multi-Modal Synergies for Enhancing 3D Multi-Object Tracking

Xinglong Xu, Weihong Ren, Xiai Chen, Huijie Fan, Zhi Han, Honghai Liu

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D Multi-Object Tracking (MOT) aims to establish and maintain consistent object trajectories in continuously dynamic environments. At present, the tracking-by-detection has emerged as a dominant paradigm for 3D MOT, due to its simplicity and efficiency. However, this paradigm depends heavily on the performance of 3D object detection, which usually fails to handle crowded or occluded scenarios. Some studies have noted that the introduction of 2D object detection can help to enhance the 3D MOT, but they often ignore the synergies between multi-modal data sources. In this letter, we aim to fully Exploiting Multi-Modal Synergies for Enhancing 3D Multi-Object Tracking, named as EMMS-MOT, which can significantly reduce tracking failures caused by low-quality detections and similar motion distractors. Specifically, we first propose a Multi-Modal Location Coordinator (MMLC), and it can enhance 3D detections with 2D detections, by imposing the spatial synergies between different modalities. In addition, we also design a Multi-Modal Motion Estimator (MMME), which is beneficial to correct tracklet motion states, by simultaneously modelling 2D and 3D motions via an extended Kalman Filter. Experimental results on two public datasets KITTI and NuScenes demonstrate that our proposed EMMS-MOT tracker can outperform the state-of-the-art approaches.