PACAF-Net: pixel shuffling based fiderality-preserved up/downsampling and adaptive cross-attention fusion for effective medical image segmentation

Yuanhang Cai, Aouaidjia Kamel, Chongsheng Zhang

Published: 01 Jan 2025, Last Modified: 20 May 2025Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Lesion area segmentation is a prerequisite for computer-aided clinical diagnosis. Although deep learning-based techniques have significantly advanced the field, most of them suffer from fine-grained feature loss during the encoder-decoder feature fusion, especially when using traditional upsampling or downsampling methods. To address this problem, we propose PACAF-Net (Pixel shuffle based fiderality-preserved up/downsampling and Adaptive Cross-Attention Fusion Network), a new learning architecture which leverages Transformer encoders to capture long-range dependencies. Instead of using traditional upsampling and downsampling operations such as pooling-based methods that lead to information loss, we use Pixel Shuffle/Unshuffle as a sampling method for medical image segmentation to minimize feature loss during the upsampling and downsampling stages of PACAF-Net. To improve feature fusion, PACAF-Net incorporates a newly proposed Adaptive Cross-Attention Fusion module that upscales features while preserving spatial information across channels. Feature loss during upsampling is mitigated by Pixel Shuffle Upsampling module. Moreover, we introduce a Hierarchical Feature Guidance module, which captures spatial features at different scales using multiple Transformer encoders. The evaluation of PACAF-Net on public medical image segmentation datasets shows that it surpasses state-of-the-art methods. Code and models are made available on the Github repository: https://github.com/pinttouge/PACAF-Net.