Stabilizing the Convolution Operations for Neural Network-Based Image and Video Codecs for Machines

Published: 01 Jan 2023, Last Modified: 17 Jul 2025ICME Workshops 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep convolutional neural networks are generally trained in the floating-point number format. However, the convolution operation in the floating-point domain suffers from numerically unstable behavior due to the limitation of the precision and range of the number format. For deep convolutional neural network-based image/video codec, the instability may cause corrupted reconstructions when the decoder works in a different computing environment. This paper proposes a post-training quantization technique where the convolution operations are performed in the integer domain while other operations are in the floating-point domain. We derived the optimal scaling factors and bits allocation strategy for the input tensor and kernel weights. With the derived scaling factors, the codec can use the significant bits of the single-precision floating-point number for the convolution operations, which does not require the system to support integer operations. Experiments on a learned image codec on machine consumption show that the proposed method achieves the similar performance as the floating-point version while achieving stable behavior on different platforms.
Loading