Abstract: Feature compression is an important branch of video coding for machines (VCM). While existing methods draw inspiration from image compression, they have not fully utilized the unique characteristics of features. In this paper, we investigate feature characteristics in two key aspects: dimensionality and sparsity. Our analysis reveals that the low spatial dimensionality and high channel dimensionality of features make traditional 2D convolution-based methods, which usually downsample along spatial dimensions while increasing channels, unsuitable for feature compression. To address this, we propose compressing features using 3D convolution. Additionally, considering the sparsity characteristic, we propose applying sparse convolution to reduce model complexity. To thoroughly investigate the proposed 3D sparse convolution-based method, we verify it with various network structures and input features. Experimental results demonstrate the superiority of our proposed method over traditional 2D convolution-based approaches, highlighting its potential for effective feature compression.
Loading