DisCoVAE: Disentangling pretrained latent spaces with customized controls using contrastive learning

27 Mar 2026 (modified: 22 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep generative models have recently achieved high-quality results but still lack customizable control abilities. Existing methods mainly rely on using labels as additional inputs to directly condition the generation process. This also constrains the model to be retrained entirely and might ignore the high-level features already captured in the latent space. In this paper, we propose a new approach that allows one to reshape the latent space of pretrained generative models based on user-specified groups of samples. Our method relies on a variational autoencoder trained with an additional supervised contrastive learning regularization. This leads to a new control space, which disentangles the features of interest based on the underlying variations found within the custom groups. We propose an iterative approach that can disentangle a single labeled feature at a time from the remaining latent factors without additional supervision, which enables to build the control space gradually. We show that our method outperforms state-of-the-art disentanglement approaches on reference datasets, while also enabling high-quality image synthesis with fine-grained continuous controls on real-world datasets.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Andriy_Mnih1
Submission Number: 8131
Loading