Keywords: uncertainty quantification, negative log-likelihood
TL;DR: We show that NLL evaluation is misleading in UQ by studying how different parameter uncertainties affect scoring.
Abstract: Uncertainty quantification (UQ) methods for regression are frequently judged based on improvements measured in negative log-likelihood (NLL). In this work, we question the practice of relying too heavily on NLL, arguing
that typical evaluations can conflate better quantifying predictive uncertainty with simply reducing it. We do so by studying how the uncertainty of various distributional parameters affects NLL scoring. In particular, we demonstrate how the error of the mean materializes as uncertainty, and how the uncertainty of the variance has almost no effect on scores.
Our results question how much of the reported progress is due to decreasing, rather than accurately representing, uncertainty; highlighting the need for additional metrics and protocols that disentangle these two factors.
Submission Number: 104
Loading