Multi-Modal Deep Reinforcement Learning for Edge-Assisted Video Analytics

Shen He, Chaokun Zhang, Aojia Lv, Jingshun Du, Wenyu Qu

Published: 2023, Last Modified: 13 Nov 2024CSCWD 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rise of artificial intelligence, various video analytics models have been applied in many fields. Numerous studies are preoccupied with expanding the size of the model to achieve greater accuracy, yet inference latency is unbearable when models are deployed to resource-constrained terminal devices. Edge computing ensures efficient and accurate inference by offloading video inference tasks to edge servers because of low network latency and high-performance hardware. However, edge-assisted video analytics systems encounter challenges due to the dynamic nature of video frames, fluctuating network signals, and the mismatch between arithmetic power and model size. To overcome these obstacles, we propose MDRL, an edge-assisted video analytics framework based on Multi-modal Deep Reinforcement Learning. MDRL adaptively determines the offloading strategy of video frames by observing multi-modal information from video frames and network signals and updates the parameters using Deep Reinforcement Learning (DRL) algorithms. We compare MDRL with various baselines and the experimental results show that MDRL has the highest overall optimization of latency, accuracy, and network bandwidth consumption in various experimental scenes.