Abstract: Fire hazards bring great harm to human beings and nature. Advances in computer vision technology have made it possible to detect fire early through surveillance videos. In recent works, CNN is widely used in fire detection but it cannot model long-range dependencies and its capability of global feature processing is poor. When dealing with the problem of early fire detection, the flame target is small and the color characteristics are not obvious, so the effect of the previous fire detection methods is poor. Transformer’s strong capacity of feature processing and growing success in visual field highlight its potential and provide us with new ideas, but its large calculation has a certain impact on the detection speed. Therefore, in this paper, we design a network combining CNN and Transformer, (GLCT), which can model global and local information and achieve a balance between accuracy and speed. In the backbone MobileLP, linear highlighted attention mechanism is used to reduce the amount of computation, and locality is introduced in the feed forward network. Feature fusion is carried out by combining the designed backbone with BiFPN. Equipped with YOLO Head, the whole fire detection model is constructed. By detecting the surveillance video images of early fire, our network outperforms some representative and excellent object detection works, including YOLOv4, MobileViT, PVTv2 and its variants, showing its reliability in early fire detection.
0 Replies
Loading