Erasure-based interaction network for red-green-blue and thermal object detection and a unified benchmark
Abstract: Recently, many breakthroughs have been made in the field of video object detection, but the performance is still limited due to the imaging limitations of RGB (red-green-blue) sensors in adverse illumination conditions. To alleviate this issue, this work introduces a new computer vision task called RGBT (red-green-blue and thermal) video object detection by introducing the thermal modality that is insensitive to adverse illumination conditions. To promote the research and development of RGBT video object detection, we design a novel Erasure-based Interaction Network (EINet) and establish a comprehensive benchmark dataset for this task. Traditional methods often leverage temporal information by using many auxiliary frames, and thus have a large computational burden. Considering thermal images exhibit less noise than RGB ones, we develop a negative activation function that is used to erase the noise of RGB features with the help of thermal image features. Furthermore, with the benefits from thermal images, we rely only on a small temporal window to model the spatio temporal information to greatly improve efficiency while maintaining detection accuracy. Our dataset consists of 50 pairs of RGBT video sequences with complex backgrounds, various objects and different illuminations, which are collected in real traffic scenarios. Extensive experiments on the proposed dataset demonstrate the effectiveness and efficiency of EINet. Compared with existing detectors, EINet achieves a relatively balanced performance with a detection accuracy of 46.3% and a speed of 92.6 frames per second. This project will be released to the public for free academic usage at https://github.com/tzz-ahu.
Loading