Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

Qing Guo; Junya Chen; Dong Wang; Yuewei Yang; Xinwei Deng; Jing Huang; Lawrence Carin; Fan Li; Chenyang Tao

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Jing Huang, Lawrence Carin, Fan Li, Chenyang Tao

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: mutual information, variational inference, contrastive learning, few-shot learning, meta learning

TL;DR: We present a novel contrastive variational mutual information bound FLO that better balances the bias-variance trade-offs

Abstract: Successful applications of InfoNCE (Information Noise-Contrastive Estimation) and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning . While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation yields a new unified theoretical framework encompassing popular variational MI bounds, and leads to a novel, simple, and powerful contrastive MI estimator we name FLO. Theoretically, we show that the FLO estimator is tight, and it converges under stochastic gradient descent. Empirically, the proposed FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using extensive benchmarks, and we further inspire the community with novel applications in meta-learning. Our presentation underscores the foundational importance of variational MI estimation in data-efficient learning.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/tight-mutual-information-estimation-with/code)

22 Replies

Loading