Deep Block Transform for Autoencoders

Kyong Hwan Jin

Published: 2021, Last Modified: 29 Sept 2023IEEE Signal Process. Lett. 2021Readers: Everyone

Abstract: We discover that a trainable convolution layer with a stride over 1 and kernel ≥ stride is identical to a trainable block transform. A block transform is performed when we use a convolution layer with a stride ≥ 2 and a kernel ≥ the stride. For instance, if we use the same widths, such as a 2×2 convolution kernel and stride-2, there are no overlaps between sliding windows, so this layer operates a block transform on the partitioned 2×2 blocks. A block transform reduces the computational complexity due to a stride ≥ 2. To keep the original size, we apply a transposed convolution (stride $=$ kernel ≥ 2), an adjoint operator of a forward block transform. Based on this relationship, we propose a trainable multi-scale block transform for autoencoders. The proposed method has an encoder consisting of two sequential convolutions with stride-2, a 2×2 kernel, and a decoder consisting of the encoder's two adjoint operators (transposed convolution). Clipping is used for nonlinear activations. Inspired by the zero-frequency element in the dictionary learning method, the proposed method uses DC values for residual learning. The proposed method shows high-resolution representations, whereas the stride-1 convolutional autoencoder with 3×3 kernels generates blurry images.

0 Replies