Abstract: Highlights•Proposing Sparse Transformer Block with limited latent tokens to enhance Swin Transformer efficiency and performance.•We examined the effect of regularization on the attention weight obtained in SparTa Block.•We obtained accuracy improvement on ImageNet100, CIFAR10, and CIFAR100 benchmark datasets.•We obtained a smaller parameter size compared to the state-of-the-art Transformer model.
Loading