A Hyperparameter Benchmark of VAE-Based Methods for scRNA-seq Batch Integration

Mohamad kassab; Eduardo da Veiga Beltrame; Luiz Maniero

A Hyperparameter Benchmark of VAE-Based Methods for scRNA-seq Batch Integration

Mohamad kassab, Eduardo da Veiga Beltrame, Luiz Maniero

17 Sept 2025 (modified: 11 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Single-cell RNA-seq, batch integration, variational autoencoders, scVI, MrVI, LDVAE, hyperparameter benchmarking

TL;DR: We systematically benchmark hyperparameters of VAE-based models for scRNA-seq integration, showing trade-offs between batch removal and biological signal preservation across datasets.

Abstract: We present the first systematic model architecture hyperparameter benchmark of variational-autoencoder (VAE)–based for single-cell RNA sequencing batch integration. We focused on models available under the scvi-tools framework, and compared the scVI, MrVI, and LDVAE models across four datasets with heterogeneous designs under two feature regimes: training with all, and utilizing only highly variable genes (HVGs). Our study executes 960 training runs spanning 120 configurations for the three models that vary latent size capacity, network depth/width, and evaluates with a comprehensive, standardized metric suite from the scib package capturing both batch removal and biological conservation (Batch ASW, PCR-batch, iLISI, graph connectivity, NMI, ARI, label ASW, isolated-label F1/ASW, cLISI, and trajectory conservation), qualitative analysis with UMAP and t-SNE, alongside PCA, random projection, and unintegrated baselines. We find trade-offs across datasets: scVI delivers the strongest overall integration, driven by superior batch correction; LDVAE shows dataset-specific gains in biological structure preservation; MrVI shows stability and batch correction superiority under multi-protocol datasets, however, it is more resource-intensive. Selecting for HVG features generally outperforms full-gene training for all models. Model architecture hyperparameter analysis indicates that moderate to high latent dimensionality (more than 30 dimensions) often yields the best balance, while sensitivity to latent size appears to be related to dataset heterogeneity (diverse tissues, laboratories, chemistries, and gene-coverage profiles), and larger latent spaces tend to improve batch mixing but can reduce biological conservation. We provide model and dataset-specific guidelines that translate our analysis into practical defaults and tuning rules for the practical deployment of VAE-based integration in single-cell studies.

Primary Area: datasets and benchmarks

Submission Number: 8581

Loading