Abstract: Image segmentation and image reconstruction are two of the most prominent tasks in current computer vision research, with numerous advanced models contributing to increasing task accuracy. However, most of the existing models for these tasks are trained independently, overlooking the complementary potential of these tasks during the training process. In this work, we propose a progressive segmentation refinement strategy by designing a dual-stage joint multi-task consistency learning model based on the Transformer, effectively combining the image segmentation and image reconstruction tasks to achieve fine-grained segmentation of medical images. Specifically, we present a multi-stage joint multi-task consistency learning network, which includes a shared transformer encoder and two independent transformer decoders. These decoders are responsible for image segmentation and lesion region reconstruction tasks, respectively. The image reconstruction task aids the model in learning the feature representations of lesion regions, helping to refine the segmentation boundaries and improve segmentation precision. In addition, the model leverages semi-supervised learning by computing loss on the reconstructed masked lesion regions, further enhancing the generalizability of the model. Experimental results on the Kvasir-SEG, Kvasir-Capsule, ISIC 2016, and ISIC 2018 datasets demonstrate that our method outperforms other state-of-the-art methods.
External IDs:dblp:journals/eaai/HanZHLYWW25
Loading