Multimodal Datasets with Controllable Mutual Information

ICLR 2026 Conference Submission23961 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Flow Matching, Mutual Information, Directed Acyclic Graphs, Synthetic Data
TL;DR: We introduce a framework for generating highly multimodal datasets with explicitly calculable mutual information between modalities.
Abstract: We introduce a framework for generating highly multimodal datasets with explicitly calculable mutual information between modalities. This enables the construction of benchmark datasets that provide a novel testbed for systematic studies of mutual information estimators and multimodal self-supervised learning techniques. Our framework constructs realistic datasets with known mutual information using a flow-based generative model and a structured causal framework for generating correlated latent variables.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 23961
Loading