Mutual Information Estimation as a Difference of Entropies for Unsupervised Representation LearningDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Contrastive loss has been successfully exploited in the latest visual unsupervised representation learning methods. Contrastive loss is based on a lower-bound estimation of mutual information where its known limitations include batch size dependency expressed as $O(log (n))$. It is also commonly known as negative sampling size problem. To cope with the limitation, non-contrastive methods have been proposed and they have been shown to achieve outstanding performance. The non-contrastive methods, however, are limited in that they are not based on principled designs and their learning dynamics can be unstable. In this work, we derive a principled non-contrastive method where mutual information is estimated as a difference of entropies and thus no need for negative sampling. With our best knowledge, this is the first successful implementation of difference of entropies for visual unsupervised representation learning. Our method performs on par with or better than the state-of-the-art contrastive and non-contrastive methods. The main idea of our approach is to extend Shannon entropy $H(\rmZ)$ to von Neumann entropy $S(\rmZ)$. The von Neumann entropy can be shown to be a lower bound of Shannon entropy and it can be stably estimated with a small sample size. Additionally, we prove that the conditional entropy term $H(\rmZ_1|\rmZ_2)$ is upper bounded by the negative cosine similarity for the case of weak Gaussian noise augmentation. Even though the derivation is limited to a special case of augmentation, it provides a justification of cosine similarity as the measure between positive samples.
21 Replies

Loading