Ma-Yolo: Video Object Detection Via Motion-Assisted Yolo

Xinyu Wang, Hong-Shuo Chen, Zhiruo Zhou, Jie-En Yao, C.-C. Jay Kuo

Published: 2025, Last Modified: 05 May 2026ICIPW 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video object detection (VOD) is vital in edge intelligence applications such as surveillance, autonomous systems, and wearable devices. Although high-performance still-image detectors such as YOLO are commonly employed in VOD tasks, applying detection to every video frame has redundant computation and a higher energy cost, making such a solution less attractive for resource-constrained platforms. This paper proposes MA-YOLO (Motion-Assisted YOLO), a lightweight VOD framework tailored for edge environments. Instead of inferring on every frame, MA-YOLO executes complete YOLO detection only on sparse keyframes and propagates the results to intermediate frames using motion information derived from precomputed H. 264 motion vectors and geometric offsets from reference detections. We introduce a lightweight, XGBoost-based decision module tailored to each geometric offset regression, realizing efficient detection propagation from the keyframes to non-keyframes. Experiments on the ImageNet VID dataset demonstrate that MA-YOLO reduces inference cost while maintaining competitive accuracy, offering a practical and efficient solution for edge-based video analysis.

External IDs:dblp:conf/icip/WangCZYK25