FrameBoost: Advanced Video Analytics With Inference Trigger Frame Selection via Tracking Error Estimation

Published: 2025, Last Modified: 16 Oct 2025IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: For real-time video analytics, the inference trigger frame selection problem is crucial. While continuous video streams offer rich data, processing every frame is computationally intensive. Thus, modern systems strategically analyze the frames to choose frames for inference, i.e., inference trigger frames (ITF). Between ITFs, they use tracking to leverage temporal coherence. Consequently, the ITF selection probelm serves as the cornerstone technology for video analytics, directly impacting both accuracy and efficiency. Beyond selecting critical frames for system accuracy, a major requirement for ITF selection algorithms is efficiency. They must be able to extract informative frames while imposing minimal computational overhead on the analytics pipeline. Current approaches like ‘frame differencing’ methods, while promising, have fundamental limitations. These methods rely on simple inter-frame changes that often correlate poorly with actual tracking quality, particularly for differently sized objects. We introduce $\mathsf{FrameBoost}$ , an intelligent ITF selection method that adopts aggregate tracking error (ATE) as its metric, based on object-wise IoU predictions. $\mathsf{FrameBoost}$ addresses a key limitation of existing methods: their tendency to overemphasize large object movements while undervaluing smaller objects that are more prone to tracking errors. We identify key factors affecting IoU prediction performance and develop a lightweight solution that requires minimal computational resources. By directly measuring tracking quality, $\mathsf{FrameBoost}$ makes more sophisticated decisions about when new detections are truly necessary. Through comprehensive evaluations across diverse scenarios, we demonstrate that $\mathsf{FrameBoost}$ achieves superior accuracy with comparable or lower computational overhead than existing approaches, making it ideal for various video analytics systems with different system-level requirements.
Loading