A correlation analysis approach to finding interpretable latent representations via conditional generative models

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The supervised disentanglement problem, that is, learning interpretable nonlinear latent representations of a target data view while being informed by an auxiliary data view, is a central challenge of interpretable machine learning. We reformulate this problem as a partially linear invertible canonical correlation analysis (PLiCCA). Specifically, given two data views, (i) complex data lying near a potentially high-dimensional manifold, and (ii) auxiliary high-dimensional multivariate data, our approach represents the complex data with latent variables that are maximally correlated with sparse linear combinations of the auxiliary variables. This yields an embedding ordered by interpretability, in contrast to regression-based approaches to supervised disentanglement. We formalize the population PLiCCA problem and provide existence results. We then establish a close theoretical connection between PLiCCA and well-established conditional latent variable models, specifically conditional variational autoencoders and conditional normalizing flows, enabling practical estimation. We demonstrate the utility of our approach on brain morphological data, where our learned embeddings are guided by demographic, psychometric, and behavioral variables, facilitating scientific interpretation and improving generalization.
Submission Number: 2149
Loading