PathUp: Patch-wise Timestep Tracking for Multi-class Large Pathology Image Synthesising Diffusion Model

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In digital pathology, cancer lesions are identified by analyzing the spatial context within pathology images. Synthesizing such complex spatial context is challenging as pathology whole slide images typically exhibit high resolution, low inter-class variety, and are sparsely labeled. To address these challenges, we propose PathUp, a novel diffusion model tailored for the synthesis of multi-class high-resolution pathology images. Our approach includes a latent space patch-wise timestep tracking, which helps to generate high-quality images without tiling artifacts. Expert pathology knowledge is integrated into the model through our patho-align mechanism. To ensure robust generation of lesion subtypes and scale information, we introduce a feature entropy loss function. We substantiate the effectiveness of our method through both qualitative and quantitative evaluations, supplemented by assessments from human experts, demonstrating the authenticity of the synthetic data produced. Furthermore, we highlight the potential utility of our generated images as an augmentation method, thereby enhancing the performance of downstream tasks such as cancer subtype classification.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Content] Vision and Language, [Experience] Multimedia Applications
Relevance To Conference: This study contributes to the field of multimedia and multimodal processing by addressing the challenge of integrating multimodal expert knowledge. We introduce PathUp, a novel generative model tailored specifically for learning the generation of multi-resolution lesion subtypes from pathology image-text pairs, filling a gap in a domain where established methodologies are lacking. Through the incorporation of patho-align and a feature entropy loss function, our approach enhances synthetic images by leveraging multimodal expert pathology knowledge, thereby enriching inter-class variety. Additionally, the integration of a patch-wise timestep tracking strategy within the latent diffusion model framework substantially improves the model's capability to produce high-resolution images while effectively addressing tiling artifacts. Our method demonstrates proficiency in generating realistic pathology images across various resolutions, offering significant potential for data augmentation, particularly in tasks such as lesion subtype classification. Consequently, our research constitutes a novel contribution to the advancement of multimodal processing methodologies.
Supplementary Material: zip
Submission Number: 3286
Loading