Masked hybrid attention with Laplacian query fusion and tripartite sequence matching for medical image segmentation
Abstract: Medical image segmentation is pivotal in computer-aided diagnosis systems, demanding high precision and contextual understanding. Vision Transformer-based approaches have gained much attention recently due to their excellent performance and ability to capture long-range dependencies in medical images. However, research shows they suffer from inadequate multi-scale feature integration, poor object localization, and inconsistent mask predictions, leading to sub-optimal segmentation performance. This paper addresses these challenges by redefining semantic medical image segmentation through learnable object queries within an enhanced transformer framework with a masked hybrid attention querying mechanism, optimizing multi-scale feature fusion, object localization, and instance-specific segmentation. First, this study presents a novel transformer-based masked hybrid attention mechanism using Laplacian query fusion on learnable query features and incorporating a novel tripartite sequence matching technique as part of the enhanced decoder block to improve the consistency of mask predictions and optimize decoder queries. The designed hybrid multi-head self- and cross-attention mechanisms aim to selectively integrate multi-scale features, ensuring optimal feature combinations for precise segmentation. Secondly, multiple class tokens are incorporated to improve object localization and capture class-specific characteristics within the transformer framework, leveraging the transformer decoders’ ability to learn distinct instance representations. Experimental results and extensive ablation studies demonstrate the effectiveness of the proposed approach on three publicly available datasets, obtaining better segmentation results compared to various state-of-the-art approaches using various evaluation metrics. Specifically, the proposed model achieves a Dice Score of 95.25%, 92.75%, and 85.25% on LUNA, ISIC, and DRIVE datasets, respectively.
External IDs:dblp:journals/nca/EkongYPSUWUC25
Loading