Ordered $\mathcal{V}$-information Growth: A Fresh Perspective on Shared Information

Published: 22 Jan 2025, Last Modified: 11 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a new measure of computation-aware shared information which can be used for both MI estimation and dataset complexity estimation.
Abstract: Mutual information (MI) is widely employed as a measure of shared information between random variables. However, MI assumes unbounded computational resources—a condition rarely met in practice, where predicting a random variable $Y$ from $X$ must rely on finite resources. $\mathcal{V}$-information addresses this limitation by employing a predictive family $\mathcal{V}$ to emulate computational constraints, yielding a directed measure of shared information. Focusing on the mixed setting (continuous $X$ and discrete $Y$), here we highlight the upward bias of empirical $\mathcal{V}$-information, $\hat I_{\mathcal{V}}(X \rightarrow Y)$, even when $\mathcal{V}$ is low-complexity (e.g., shallow neural networks). To mitigate this bias, we introduce $\mathcal{V}$-Information Growth (VI-Growth), defined as $\\hat I_{\mathcal{V}}(X \rightarrow Y) - \hat I_{\mathcal{V}}(X' \rightarrow Y')$, where $X', Y' \sim P_X P_Y$ represent independent variables. While VI-Growth effectively counters over-estimation, more complex predictive families may lead to under-estimation. To address this, we construct a sequence of predictive families $\mathcal{V}_1, \mathcal{V}_2, \ldots, \mathcal{V}$ of increasing complexity and compute the maximum of VI-Growth across these families, yielding the ordered VI-Growth (O-VIG). We provide theoretical results that justify this approach, showing that O-VIG is a provably tighter lower bound for the true $\mathcal{V}$-Information than empirical $\mathcal{V}$-Information itself, and exhibits stronger convergence properties than $\mathcal{V}$-Information. Empirically, O-VIG alleviates bias and consistently outperforms state-of-the-art methods in both MI estimation and dataset complexity estimation, demonstrating its practical utility.
Submission Number: 276
Loading