FiGuRO - Intrinsic Dimension Estimation for Multi-Modal Data

ICLR 2026 Conference Submission9424 Authors

17 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation Learning, Intrinsic Dimension, Multi-modal Learning, Interpretability
Abstract: A fundamental challenge in representation learning is determining the complexity, or the Intrinsic Dimension (ID), of the data. This becomes especially difficult in the multi-modal setting when trying to learn disentangled subspaces for shared and private (modality-specific) information. Existing ID estimation techniques are ill-suited for this task, as they are either static and uni-modal or, in the case of state-of-the-art contrastive methods, adapt only to the shared ID implicitly. This leaves a critical gap for a method that can estimate the complete ID structure of multi-modal data. We introduce Fidelity-Guided Rank Optimization (FiGuRO), a framework for learning the IDs of uni- and multi-modal data. FiGuRO learns the dimensions of low-rank projections using truncated singular value decomposition and an algorithm that determines when to reduce or increase dimensionalities and in which latent spaces. We demonstrate that FiGuRO outperforms other ID estimation techniques and is more robust to hyperparameter changes. In the multi-modal setting, FiGuRO successfully decomposes shared and modality-specific information and captures differences between scales of IDs and varying ratios between the subspaces on simulations and real datasets. Our work provides a quantitative framework for assessing the shared and private informational contributions of multi-modal data. This helps construct more interpretable models and can guide strategic and efficient data collection in fields like biology and medicine.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9424
Loading