CerMixer: An Efficient Model for Cervical Cancer Classification Based on Patching and Multi-scale Depthwise Convolutional Fusion

Thanh-An Pham; Van-Dung Hoang; Doan-Hieu Tran; Tuong-Lan Le Van

CerMixer: An Efficient Model for Cervical Cancer Classification Based on Patching and Multi-scale Depthwise Convolutional Fusion

Thanh-An Pham, Van-Dung Hoang, Doan-Hieu Tran, Tuong-Lan Le Van

Published: 01 Jan 2025, Last Modified: 19 Jun 2025ACIIDS (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Transfer learning is often applied to medical image analysis to address data scarcity and overfitting problems in model training. This paper presents a new approach for small medical image classification. The proposed approach, CerMixer, consists of patch embedding blocks, spatial mixing blocks, and channel mixing blocks. In this, the spatial mixing block combines the MBConv layer with a new multi-scale depth convolutional fusion block to perform spatial mixing. The channel mixing block is performed using 1 × 1 convolution with the GELU activation layer and BatchNorm layer. The method is evaluated on two public datasets including Mendeley Liquid Based Cytology (LBC) and SIPaKMeD. CerMixer achieves 100% and 99.87% accuracy for pre-training and training from scratch on the LBC dataset. On the SIPaKMeD dataset, the accuracy is 99.21% and 99.06% for pre-training and training from scratch. Our model outperforms the Vision Transformer (ViT) and Swin Transformer when trained from scratch. When pre-trained on the ImageNet 1K dataset, the proposed model performs on par with ViT and is only slightly inferior to the Swin Transformer with fewer parameters.

Loading