Abstract: Models employing heteroscedastic Gaussian likelihoods parameterized by amortized mean and variance networks are both probabilistically interpretable and highly flexible, but unfortunately can be brittle to optimize. Maximizing log likelihood encourages local Dirac densities for sufficiently flexible mean and variance networks. Data lacking nearby neighbors can provide this flexibility. Gradients near these unbounded optima explode, prohibiting convergence of the mean and thus requiring high noise variance to explain the dependent variable. We propose posterior predictive checks to identify such failures, which we observe can surreptitiously occur alongside high model likelihoods. We find existing approaches that bolster optimization of mean and variance networks to improve likelihoods still exhibit poor predictive mean and variance calibrations. Our notably simpler solution, to treat heteroscedastic variance variationally in an Empirical Bayes regime, regularizes variance away from zero and stabilizes optimization, allowing us to preserve or outperform existing likelihoods while improving predictive mean and variance calibrations and thereby sample quality. We empirically demonstrate these findings on a variety of regression and variational autoencoding tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Stephan_M_Mandt1
Submission Number: 185
Loading