SPGNet: A Serial-Parallel Gated Convolutional Network for Image Classification on Small Datasets

Yun Song, Jinxuan Wang, Miaohui Wang

Published: 01 Jan 2024, Last Modified: 13 Nov 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Vision Transformers (ViTs) pose challenges in training and deploying deep models due to their lack of inductive biases. Previous literature integrated the key ingredients (e.g., long-range relations or input-adaptive weights) of ViTs into convolutional neural networks (CNNs) to address these bias issues on large-scale datasets like ImageNet-1K. However, the performance of these key ingredients on small-scale datasets has received little attention. In this paper, we have decomposed large-kernel convolution in a serial-parallel manner to extract multi-scale image features. By integrating them into a gated convolutional architecture, we have constructed a network backbone for the image classification in the small-scale dataset scenario, called SPGNet. Experiments on public small classification benchmark datasets show that SPGNet achieves a Top-1 accuracy of 86.62% on the CIFAR-100 and 76.57% on the Tiny ImageNet. Moreover, we have conducted experiments on the semantic segmentation task, and our method also achieves promising results under the similar architectures and training configurations.

External IDs:dblp:conf/ijcnn/SongWW24