REO: Resource efficient object detection in embedded system using bitstreams

Se-Woon Kong, Young-Min Kim, Seung Hwan Cho

Published: 22 Jul 2024, Last Modified: 27 Jan 2026Multimedia Tools and ApplicationsEveryoneRevisionsCC BY-NC-ND 4.0

Abstract: With the rapid development of deep learning methods, intelligent video surveillance technology based on deep learning is being actively applied in closed-circuit television (CCTV) surveillance centers. However, owing to the limited computing power of CCTV control centers, only a few CCTVs out of many are analyzed in real time. Compressed video is a good alternative; however, its detection performance is generally insufficient. We proposed a video object detection framework called resource-efficient object detection (REO), which refers to previous detection results to use compressed video features better. We primarily use the predicted frame (P-frame), generated from the entropy decoding state of high-efficiency video coding (HEVC), to save computing power in embedded systems with limited power, such as CCTV systems. To represent the P-frame information well, we developed two features, the motion vector (MV) and prediction unit size (PUS), which were combined and reconstructed into a hue, saturation, and value (HSV) space to generate the PUMV feature. The comprehensive REO procedure improves performance by propagating the results of the previous frame onto the next frame by adding a shallow guided model to support incomplete P-frames. Real-time processing and embedded system deployment are facilitated because the P-frame features are extracted using the shallow guide model. In addition, the method only requires full decoding for the I-frames, which are sparse frames and can significantly reduce computing power. Empirical validation shows that our proposed REO produces similar results to feature propagation object detection methods despite its computational efficiency, particularly in complex environments.