Abstract: Initially designed for image generation, diffusion models can also be effectively applied to various tasks, including semantic segmentation. However, most existing diffusion-based approaches for semantic segmentation operate in high-dimensional pixel space, demanding a lot of computing and memory resources during training and inference. This paper makes the first attempt to utilize latent diffusion models for semantic segmentation. Specifically, we propose a fast yet effective latent diffusion model and evaluate it on medical image segmentation tasks. Firstly, we train a Variational Autoencoder (VAE) network to convert binary image masks into compact latent vectors. The diffusion process can then be executed in this low-dimensional latent space and thus drastically accelerated. Subsequently, we employ the VAE decoder to reconstruct a precise prediction map from the latent output vector produced by the diffusion process. Eventually, we refine the final segmentation results through a straightforward post-processing step using morphological operations. We report our results on two public datasets, including colon polyp images and skin cancer images. Experiments show that our approach achieves competitive accuracy compared to traditional diffusion models while having much better training and inference speed, as well as much more efficient memory consumption.
Loading