MediBench: A Benchmark for VAEs in Medical Imaging Across Fidelity, Structure, and Latent Utility

MediBench: A Benchmark for VAEs in Medical Imaging Across Fidelity, Structure, and Latent Utility

ICLR 2026 Conference Submission21635 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Variational autoencoders; medical image analysis; medical image reconstruction

TL;DR: Benchmarking VAEs for reconstruction, clinical structure preservation, and latent utility in medical imaging

Abstract: High-resolution medical images pose considerable computational challenges for training deep learning models. While modern architectures continue to achieve strong performance, these demands have motivated a shift toward latent space–based approaches. Particularly in generative modeling, Variational Autoencoders (VAEs) provide an efficient foundation for representation learning. The effectiveness of this entire paradigm, however, is contingent upon the VAE's ability to fulfill a dual mandate: preserving sufficient information for downstream understanding tasks while enabling high-fidelity image generation. Despite the central role of this dual capability, the medical imaging community lacks a standardized framework for its systematic evaluation. To fill this gap, we introduce MediBench, a comprehensive benchmark designed to systematically evaluate how existing VAEs perform in the medical domain. Our framework evaluates VAEs across three pillars: (1) Reconstruction Fidelity and (2) Clinical Structure Preservation to evaluate whether reconstructions maintain essential clinically relevant structures, and (3) Latent Space Utility to measure the effectiveness of the learned latent space in supporting clinically relevant downstream analyses. We conduct an extensive evaluation on a diverse suite of medical datasets, comparing a wide range of general-purpose and medical-specific VAE architectures across 2D and 3D modalities. Our analysis reveals consistent trade-offs across the three pillars. Tokenized and vector quantized VAEs learn stronger latents than continuous VAEs. Medical pretraining improves transfer and structural preservation. Higher pixel fidelity often does not translate into downstream gains. MediBench provides a standardized and clinically grounded tool for selecting and developing VAEs in medicine. It advances reliable and efficient foundation models for medical AI.

Primary Area: datasets and benchmarks

Submission Number: 21635

Loading