Keywords: Mutual Information, Self-supervised learning
Abstract: Many self-supervised representation learning methods maximize mutual information (MI) across views. In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews. E.g.,~given two views $x$ and $y$ of the same input example, we can split $x$ into two subviews, $x^{\prime}$ and $x^{\prime\prime}$, which depend only on $x$ but are otherwise unconstrained. The following holds: $I(x; y) \geq I(x^{\prime\prime}; y) + I(x^{\prime}; y | x^{\prime\prime})$, due to the chain rule and information processing inequality. By maximizing both terms in the decomposition, our approach explicitly rewards the encoder for any information about $y$ which it extracts from $x^{\prime\prime}$, and for information about $y$ extracted from $x^{\prime}$ in excess of the information from $x^{\prime\prime}$. We provide a novel contrastive lower-bound on conditional MI, that relies on sampling contrast sets from $p(y|x^{\prime\prime})$. By decomposing the original MI into a sum of increasingly challenging MI bounds between sets of increasingly informed views, our representations can capture more of the total information shared between the original views. We empirically test the method in a vision domain and for dialogue generation.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We present a bound on conditional MI and use it to maximize the MI decomposition across views for representation learning.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=qBwWJHc5oI
10 Replies
Loading