The classic inference of energy-based probabilistic models by maximizing the likelihood of the data is limited by the difficulty of estimating the partition function. A common strategy to avoid this problem is to maximize the pseudo-likelihood instead, that only requires easily computable normalizations. In this work, we offer the perspective that pseudo-likelihood is actually more than just an approximation of the likelihood in inference problems: we show that, at zero temperature, models trained by maximizing pseudo-likelihood are associative memories. We first show this with uncorrelated binary examples, which get memorized with basins of attraction larger than any other known learning rule of Hopfield models. Then, we test this behavior on progressively more complex datasets, showing that such models capitalize on data structure to produce meaningful attractors, which in some cases correspond precisely to examples from the test set.
Track: long paper (up to 5 pages)
Keywords: Associative Memories, Energy-Based models, Probabilistic modeling, Pseudo-Likelihood
TL;DR: Optimizing pseudo-likelihood can be interpreted as building an associative memory with attractors in correspondence of previously unseen examples.
Abstract:
Submission Number: 30
Loading