- Abstract: We introduce an new technique to learn correlations between two types of data. The learned representation can be used to directly compute the expectations of functions over one type of data conditioned on the other, such as Bayesian estimators and their standard deviations. Specifically, our loss function teaches two neural nets to extract features representing the probability vectors of highest singular value for the stochastic map (set of conditional probabilities) implied by the joint dataset, relative to the inner product defined by the Fisher information metrics evaluated at the marginals. We test the approach using a synthetic dataset, analytical calculations, and inference on occluded MNIST images. Surprisingly, when applied to supervised learning (one dataset consists of labels), this approach automatically provides regularization and faster convergence compared to the cross-entropy objective. We also explore using this approach to discover salient independent features of a single dataset.
- Code: https://github.com/cbeny/RFA
- Keywords: unsupervised learning, non-parametric probabilistic model, singular value decomposition, fisher information metric, chi-squared distance
- TL;DR: Given bipartite data and two neural nets, this new objective based on Fisher information teaches them to extract the most correlated features, which can then be used to do inference.