Machine-Attention-based Video Coding for Machines

Yegi Lee, Shin Kim, Kyoungro Yoon, Hanshin Lim, Sangwoon Kwak, Hyon-Gon Choo

Published: 01 Jan 2023, Last Modified: 10 Nov 2025ICIP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Conventional video coding methods have been developed based on the human visual system (HVS). However, in recent years, video has occupied a huge portion of internet traffic, and the mount of video data for machine consumption has increased rapidly due to the progress of neural networks. This paper proposes a novel machine-attention-based video coding method for machines. Inspired by the saliency-driven research, we first extract attention regions, sensitively affecting the machine vision performance, from the object detection network. Subsequently, a maximum a posterior (MAP)-based bit allocation method is applied to assign more bits to the attention regions. Our proposed method helps to maintain high machine vision performance whereas reducing the bitrate. Experimental results show that our proposed method achieves up to 34.89% bjøntegaard delta (BD)-rate reduction for the video dataset and up to 44.70% BD-rate reduction for the image dataset compared to state-of-the-art video coding technology.