MCT-Net: a Lightweight Multiscale Convolutional Transformer Network for Polyp Segmentation

Niladri Chakraborti; Deepak Ranjan Nayak

MCT-Net: a Lightweight Multiscale Convolutional Transformer Network for Polyp Segmentation

Niladri Chakraborti, Deepak Ranjan Nayak

Published: 01 Jan 2024, Last Modified: 07 Jun 2025ICIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Accurate polyp segmentation is paramount for diagnosing colorectal cancer (CRC). Though there has been a significant stride in developing polyp segmentation methods with the help of deep learning models, the diverse changes in the shape and size of polyps during various stages still make the task more challenging. To this end, we propose a novel lightweight multiscale convolutional transformer network named MCT-Net by integrating the benefits of both convolution and transformer for accurate polyp segmentation. The MCT-Net is a U-shaped architecture, mainly comprising a multiscale encoder and a transformer decoder. The encoder facilitates learning feature representations at multiple scales and subsequently introduces a cascaded attention block to learn to emphasize only polyp regions. On the other hand, the transformer decoder fully models the long-range contextual dependencies through a modified self-attention mechanism and preserves the fine-grained contextual details through a skip connection. The MCT-Net effectively mitigates the issue faced by individual convolution operations and transformers. Quantitative and qualitative results and comparisons on three benchmark datasets confirm the effectiveness of the MCT-Net over state-of-the-art segmentation methods. Further, the ablation studies verify the impact of each introduced component in the encoder and decoder block of MCT-Net.

Loading