Abstract: In the realm of medical image analysis, self-supervised learning techniques (SSL) have emerged to alleviate labeling demands, while still facing the challenge of training data scarcity owing to escalating resource requirements and privacy constraints. Numerous efforts employ generative models to generate high-fidelity, unlabeled 3D volumes across diverse modalities and anatomical regions. However, the intricate and indistinguishable anatomical structures within the abdomen pose a unique challenge to abdominal CT volume generation compared to other anatomical regions. To address the overlooked challenge, we introduce the Locality-Aware Diffusion (Lad), a novel method tailored for exquisite 3D abdominal CT volume generation. We design a locality loss to refine crucial anatomical regions and devise a condition extractor to integrate abdominal priori into generation, thereby enabling the generation of large quantities of high-quality abdominal CT volumes essential for SSL tasks without the need for additional data such as labels or radiology reports. Volumes generated through our method demonstrate remarkable fidelity in reproducing abdominal structures, achieving a decrease in FID score from 0.0034 to 0.0002 on AbdomenCT-1K dataset, closely mirroring authentic data and surpassing current methods. Extensive experiments demonstrate the effectiveness of our method in self-supervised organ segmentation tasks, resulting in an improvement in mean Dice scores on two abdominal datasets effectively. These results underscore the potential of synthetic data to advance self-supervised learning in medical image analysis.
Primary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: This work significantly contributes to multimedia and multimodal processing by introducing a novel approach tailored for 3D CT volume generation, thereby advancing the field of data generation and synthetic data application in downstream tasks. The proposed Locality-Aware Diffusion (Lad) method specifically addresses the challenge of intricate and indistinguishable anatomical structures details in volumes, which is a common issue in multimedia and multimodal processing, enhancing the fidelity of synthetic volumes. Through the integration of a locality loss and locality priori, this method facilitates the generation of high-quality volumes essential for self-supervised learning tasks, eliminating the need for additional labeled data. This not only relieves the burden of labeling demands but also ensures privacy compliance, making it relevant across various fields. Furthermore, the effectiveness of this approach underscores its potential to advance the field of multimedia and multimodal processing through synthetic data utilization, offering insights for overcoming data challenges. Hence, this work represents a significant contribution to multimedia and multimodal processing by providing a robust solution for generating high-fidelity, unlabeled 3D volumes.
Supplementary Material: zip
Submission Number: 1160
Loading