Keywords: multi-modal learning, disentanglement, conditional sufficiency, reconstruction
Abstract: Learning disentangled representations is a fundamental task in multi-modal learning.
In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses.
Ideally, modality-specific features should be independent of shared ones while also capturing all complementary information within each modality.
This tradeoff is naturally expressed through information-theoretic criteria, but mutual-information–based objectives are difficult to estimate reliably, and their variational surrogates often underperform in practice.
In this paper, we introduce \ours, a novel disentangled representation learning approach that addresses this challenge by combining an independence-enforcing objective with a computationally efficient reconstruction loss that bounds conditional mutual information. This formulation explicitly balances independence and completeness, enabling principled extraction of modality-specific features.
We demonstrate the effectiveness of \ours on synthetic simulations, a CITE-seq dataset and multiple real-world multi-modal benchmarks.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11831
Loading