Abstract: The large-scale surveillance video analysis becomes important as the development of intelligent city. The heavy computation resources neccessary for state-of-the-art deep learning model makes the real-time processing hard to be implemented. This paper exploits the characteristic of high scene similarity generally existing in surveillance videos and proposes dynamic convolution reusing the previous feature map to reduce the computation amount. We tested the proposed method on 45 surveillance videos with various scenes. The experimental results show that dynamic convolution can reduce up to 75.7% of FLOPs while preserving the precision within 0.7% mAP. Furthermore, the dynamic convolution can enhance the processing time up to 2.2 times.
Keywords: CNN optimization, Reduction on convolution calculation, dynamic convolution, surveillance video
TL;DR: An optimizing architecture on CNN for surveillance videos with 75.7% reduction on FLOPs and 2.2 times improvement on FPS
6 Replies
Loading