Are you using test log-likelihood correctly?

Published: 30 Jan 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Test log-likelihood is commonly used to compare different models of the same data or different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations and (ii) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on root mean squared error.
Submission Length: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Changes Since Last Submission: Fixed an incorrect reference & added two more annotations to Figure 5
Video: https://drive.google.com/file/d/10Hg_OBUU52ARiWjDJK7q6fYHU74my8vX/view?usp=sharing
Assigned Action Editor: ~Michael_U._Gutmann1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1531
Loading