MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

Published: 2024, Last Modified: 04 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for dis-tinct medical tasks and are restricted to inadequate medi-cal multimodal knowledge, constraining medical compre-hensive diagnosis. In this paper, we propose MedM2G, a Medical Multi-Modal Generative framework, with the key innovation to align, extract, and generate medical multimodal within a unified model. Extending beyond single or two medical modalities, we efficiently align medical multimodal through the central alignment approach in the unified space. Significantly, our framework extracts valuable clini-cal knowledge by preserving the medical visual invariant of each imaging modal, thereby enhancing specific medical information for multimodal generation. By conditioning the adaptive cross-guided parameters into the multi-flow diffusion framework, our model promotes flexible interactions among medical multimodalfor generation. MedM2G is the first medical generative model that unifies medical generation tasks of text-to-image, image-to-text, and unified generation of medical modalities (CT, MRI, X-ray). It performs 5 medical generation tasks across 10 datasets, consistently outperforming various state-of-the-art works.