Keywords: Diffusion model, medical image generation, frequency domain, MoE.
Abstract: Medical image synthesis is a critical solution to data scarcity, yet standard Latent Diffusion Models (LDMs) are often limited by their reliance on Variational Autoencoders (VAEs) pre-trained on RGB images. Such reliance introduces domain shift and channel mismatch between the training domain and grayscale medical scans, which degrades fine anatomical detail and amplifies reconstruction artefacts. To address these limitations, we introduce DCT-MoE, a diffusion model that adopts a deterministic block-wise Discrete Cosine Transform (DCT) representation instead of a learnable VAE latent space. In detail, the proposed method maps grayscale images to a compact block-wise DCT representation that acts as a fixed, low-dimensional space. On top of this representation, a Mixture-of-Experts (MoE) backbone is integrated into the Diffusion model, providing scalable expressivity without a proportional increase in computational cost. Extensive experiments on cardiac MRI and echocardiography generation demonstrate that DCT-MoE achieves high image quality and inference efficiency compared to the state-of-the-art spatial-domain LDMs and frequency-domain generation methods.
Primary Subject Area: Generative Models
Secondary Subject Area: Image Synthesis
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 185
Loading