Residual Block Fusion in Low Complexity Neural Network-Based In-loop Filtering for Video Compression

Tong Shao; Jay N. Shingala; Ajay Shyam; Peng Yin; Ajat Suneja; Siddarth P. Badya; Arjun Arora; Sean McCarthy

Residual Block Fusion in Low Complexity Neural Network-Based In-loop Filtering for Video Compression

Tong Shao, Jay N. Shingala, Ajay Shyam, Peng Yin, Ajat Suneja, Siddarth P. Badya, Arjun Arora, Sean McCarthy

Published: 01 Jan 2024, Last Modified: 16 May 2025DCC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, a novel low complexity residual block fusion (RBF) based split luma chroma architecture is proposed to improve coding efficiency of neural network-based in-loop filter in video compression. The residual block in this architecture consists of a 1x1 convolution layer with wide activation and a regular 3x3 convolutional layer decomposed into 1x1 pointwise convolutions and 1x3/3x1 separable convolutions via Canonical Polyadic (CP) decomposition to reduce complexity. By adjusting the location of the skip connection in each residual block, the fusion of adjacent 1x1 pointwise convolutions is performed. The RBF backbone consists of a new wide activation that directly starts with PReLU and is followed by a 1x1 convolution, while the 1x1 layers after CP decomposition are fully fused. This new fusion design reduces the complexity from 17.05 kMac/Pixel to 16.56 kMac/Pixel and the number of convolutional layers by 13%. The experimental results show that new RBF architecture’s BDRate is {-0.11%, -0.31%, -0.33%} under All Intra (AI) and {-0.14%, 0.66%, 1.56%} under Random Access (RA) compared to existing residual block design, while the BD-Rate of the proposed RBF loop filer compared to VTM anchor is {-4.77%, -9.14%, -9.13%} under AI and {-5.46%, -9.31%, -9.20%} under RA. The actual decoding time is reduced by around 5% after residual block fusion. The BD-Rate and kMac/Pixel plot also shows superior trade-off between complexity and coding gain compared to state-of-the-art filters.

Loading