- Abstract: Mutual information (MI) plays an important role in representation learning. However, MI is unfortunately intractable in continuous and high-dimensional settings. Recent advances establish tractable and scalable MI estimators to discover useful representation. However, most of existing methods are not capable of providing accurate estimation of MI with low-variance when the MI is large. We argue that estimating gradients of MI is more appealing for representation learning than directly estimating MI due to the difficulty of estimating MI. Therefore, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on score estimation of implicit distributions. It exhibits a tight and smooth gradient estimation of MI in the high-dimensional and large-MI setting. We expand the applications of MIGE in both unsupervised learning of deep representations based on InfoMax and the Information Bottleneck method. Experimental results have indicated the remarkable performance improvement in learning useful representation.
- Keywords: Mutual Information, Score Estimation, Representation Learning, Information Bottleneck