Dual-Level DisentangLement ($\text{DL}^2$): Task-Adaptive Disentanglement for Resolving the Task-Generation Dilemma
Keywords: Multimodal Variational Autoencoder, Variational Autoencoder, Adaptive Disentanglement, Target Representation Learning, Multimodal Generative Learning
Abstract: Multimodal learning faces a task-generation dilemma: discriminative tasks require a purified, task-specific subset of semantics, whereas generative tasks demand the complete shared information, forcing a trade-off in a single model.
To resolve this, we propose task-adaptive disentanglement (TADL), a paradigm that dynamically disentangles representations guided by task-specific supervised signals. We instantiate this paradigm with the dual-level disentanglement (DL\textsuperscript{2}) framework, which leverages contrastive signals as a practical and efficient form of weak supervision.
DL\textsuperscript{2} first separates modality private information from shared information (Level-1) and then adaptively decomposes the shared representation into a task-relevant component and a residual component that preserves generative integrity (Level-2). This second-level disentanglement is driven by two regularizers: a virtual modality pair method for positive pairs and a common-cause mutual information (CCMI) metric for negative pairs. Extensive experiments on multimodal clustering demonstrate that DL\textsuperscript{2} achieves state-of-the-art task performance without compromising generative quality within a single model.
Primary Area: generative models
Submission Number: 14796
Loading