PSA-Swin Transformer: Image Classification on Small-Scale Datasets

Chao Shao, Shaochen Jiang, Yongming Li

Published: 01 Jan 2024, Last Modified: 13 May 2025SMC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper introduces the PSA-Swin Transformer, a novel framework for image classification on small-scale datasets, highlighting the challenges of training effective models in resource-limited environments. Recognizing the limitations of current deep learning methods that rely heavily on large-scale datasets and extensive pre-training, we propose a method for handling small datasets. Our model can effectively han-dle smaller data volumes without the need for pre-training weights. The key to our approach is in the introduction of an efficient positional embedding (EPE) module, which improves parameter utilization and network expressiveness through a grouped convolutional architecture and shuffling operations for dynamic information exchange. In addition, we integrated the Polarized Self-Attention (PSA) module in Windows Multi-Head Self-Attention (W-MSA) and named the new module PSA-W-MSA; PSA addresses the complexity of learning element-specific attention by combining polarization filtering with augmentation techniques. Through a series of experiments on the Mini-Imagenet dataset, the PSA-Swin Transformer demonstrates notable performance, especially in environments where high-quality annotated data is scarce or costly to acquire. Our research results are expected to make progress in areas that require efficient and accurate image classification using limited resources.