Accurate Boundary Alignment and Realism Enhancement for Colonoscopic Polyp Image-Mask Pair Generation

Published: 01 Jan 2025, Last Modified: 04 Nov 2025MICCAI (10) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Polyp segmentation is the foundation of colonoscopic lesion screening, diagnosis, and therapy. However, the data size of images and annotations is limited. The latent diffusion model (LDM) has emerged as a powerful tool in synthesizing high-quality medical images with low computational costs. However, the challenges of boundary-aligned image-mask pairs and image realism remain unresolved, showing that (i) the spatial relationship between the boundaries is easily distorted in the latent space; (ii) the diversity of colors, shapes, and textures, along with low boundary contrast and textures similar to surrounding tissue, makes boundary distinction of the polyps difficult. This paper proposes Polyp-LDM that encodes polyps and masks into the same latent space via a unified variational autoencoder (VAE) to align their boundaries. Furthermore, Polyp-LDM refines texture and lighting while preserving the structure by fine-tuning the VAE decoder with data augmentation and applying the style cloning module to enhance image realism. Quantitative evaluations and user preference study demonstrate that our method outperforms existing methods in image-mask pair generation. Moreover, segmentation models trained with augmented data generated by polyp-LDM achieve the best performance on three public polyp datasets. The code is available at https://github.com/16rq/Polyp-LDM.
Loading