Caveats for information bottleneck in deterministic scenarios

Artemy Kolchinsky; Brendan D. Tracey; Steven Van Kuyk

Caveats for information bottleneck in deterministic scenarios

Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk

Published: 21 Dec 2018, Last Modified: 22 Jun 2025ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y. To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X;T) and high mutual information I(Y;T). The "IB curve" characterizes the set of bottleneck variables that achieve maximal I(Y;T) for a given I(X;T), and is typically explored by maximizing the "IB Lagrangian", I(Y;T) - βI(X;T). In some cases, Y is a deterministic function of X, including many classification problems in supervised learning where the output class Y is a deterministic function of the input X. We demonstrate three caveats when using IB in any situation where Y is a deterministic function of X: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of β; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when Y is a small perturbation away from being a deterministic function of X, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset.

TL;DR: Information bottleneck behaves in surprising ways whenever the output is a deterministic function of the input.

Keywords: information bottleneck, supervised learning, deep learning, information theory

Code: [![github](/images/github_icon.svg) artemyk/ibcurve](https://github.com/artemyk/ibcurve)

Data: [ImageNet](https://paperswithcode.com/dataset/imagenet), [MNIST](https://paperswithcode.com/dataset/mnist)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1808.07593/code)

22 Replies

Loading