Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($\pi$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This manuscript introduces a novel image synthesis network, called CT2X-GAN, for generating multi-view realistic X-ray images from CT scans in an end-to-end manner. The network is trained on a multi-modal and multi-dimensional database from three domains, including CT scans, clinical collected X-ray images and digitally reconstructed radiography images. The study not only explores the potential correlations between the mutli-modal and cross-dimentional images, and also enables the multi-view sequencial synthesis of X-ray images. This is strictly and closely aligned with the theme of multimedia in Generative AI Era. Additionally, the study aims to improve the fidelity and accuracy of synthesized X-ray results. By synthesizing the X-ray images in a large amounts of anatomical structures and view angles, the study not only provide a considerable amount of realistic datasets for subsequent medical image analysis procedures, but also greatly reduces the potential risk to clinicians and patients caused by over-dosed X-rays. The pratical and effective clinical values further increase the clinical significance, user experence and relevance of mutlimedia applications.
Supplementary Material: zip
Submission Number: 2799
Loading