Abstract: We develop an empirical Bayes prior for probabilistic matrix factorization.
Matrix factorization models each cell of a matrix with two latent variables, one
associated with the cell's row and one associated with the cell's column.
How to set the priors of these two latent variables?
Drawing from empirical Bayes principles, we consider estimating the priors from data, to
find those that best match the populations of row and column latent vectors.
Thus we develop the twin population prior.
We develop a variational inference algorithm to simultaneously learn the empirical priors
and approximate the corresponding posterior. We evaluate this approach with both
synthetic and real-world data on diverse applications: movie ratings, book ratings, single-cell gene expression data, and musical preferences. Without needing to tune Bayesian hyperparameters, we
find that the twin population prior leads to high-quality predictions, outperforming manually tuned priors.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=3U6yYhaQI6
Changes Since Last Submission: As per reviewer `XJwX` request, we are uploading an updated version of the full manuscript.
Please note that, as the supplementary material, we have created a document that tracks the updates to the notation.
Our main changes are:
- We added two new baseline methods
- We added details of the competing methods to the Appendix.
- We added multiple extra experiments, including the following:
- Comparison to a baseline method based on normalizing flows
- Comparison to a baseline method akin to hierarchical poisson factorization
- Experiments with additional number of mixture components in our simulated studies
- Experiments to measure the robustness of our method to the choice of number of mixture components on all real datasets
- Experiments to measure the robustness of our method to the choice of the size of the row and column latents
- Experiments to compare baselines with our method on simulated data
- We greatly expanded the discussion to better situate our method in the field.
- We overhauled our notations. In particular, there was a confusing sentence calling the hyperparameter $\theta$ as global latent variables. We have removed that sentence.
Assigned Action Editor: ~Tom_Rainforth1
Submission Number: 2894
Loading