Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Yubo Zhang, Qingming Huang, Liying Zheng

Published: 01 Jan 2024, Last Modified: 13 Nov 2024Neural Comput. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in unsatisfied tracking performance. Thus, we design an end-to-end MOT model based on Graph Convolutional Neural Networks (GCNNs) which fuses four classes of features that characterize objects from their appearances, motions, appearance interactions, and motion interactions. Specifically, a Re-Identification (Re-ID) module is used to extract more discriminative appearance features. The appearance features from object tracklets are then averaged to simplify the proposed tracker. Then, we design two GCNNs to better distinguish objects. One is for extracting interactive appearance features, and the other is for interactive motion features. A fusion module then fuses those features, getting the global feature similarity based on which an association component calculates the MOT matching results. Finally, we semantically visualize relevant structures with the GNNExplainer for insight into the proposed tracker. The evaluation results on MOT16 and MOT17 benchmarks show that our model outperforms the state-of-the-art online tracking methods in terms of Multi-Object Tracking Accuracy and Identification F1 score which is consistent with the results from the GNNExplainer.