Abstract: As video data is increasingly consumed by machines rather than solely by humans, there is a growing demand for new compression methods that efficiently accommodate this shift. Since object information extracted from videos is crucial for machine consumption, analyzing the similarity between adjacent pictures based on tracked object information can help identify motion-based redundancy which can then be utilized for video compression. In this paper, we perform object tracking on input video and analyze the similarity between adjacent pictures based on the movement of tracked objects. After classifying the pictures based on their similarity, highly redundant pictures within each group are aggressively resampled in the temporal domain to improve compression efficiency while maintaining machine performance. We propose a novel picture grouping method to cluster similar adjacent pictures and describe the process of similarity assessment. We evaluated the compression efficiency of the proposed object tracking-based adaptive temporal resampling through performance evaluation experiments, achieving BD-mAP improvements in object detection of 1.29%, 0.47%, and 2.44% for Random Access (RA), Low-Delay (LD), and All Intra (AI) modes, respectively, and achieving BD-MOTA improvements in object tracking of 0.02%, 0.61%, and 6.86% for RA, LD, and AI modes, respectively.
External IDs:dblp:conf/avss/AnKSKKJC25
Loading