Power posteriors do not reliably learn the number of components in a finite mixtureDownload PDF

Published: 09 Dec 2020, Last Modified: 05 May 2023ICBINB 2020 SpotlightReaders: Everyone
Keywords: finite mixture models, model misspecification, robust Bayesian modeling, number of components
TL;DR: Finite mixture models do not reliably learn the number of components, even with power posteriors
Abstract: Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Data science folk wisdom tells us that a finite mixture model (FMM) with a prior on the number of components will fail to recover the true, data-generating number of components under model misspecification. But practitioners still widely use FMMs to learn the number of components, and statistical machine learning papers can be found recommending such an approach. Increasingly, though, data science papers suggest potential alternatives beyond vanilla FMMs, such as power posteriors, coarsening, and related methods. In this work we start by adding rigor to folk wisdom and proving that, under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of latent components converges to 0 in the limit of infinite data. We use the same theoretical techniques to show that power posteriors with fixed power face the same undesirable divergence, and we provide a proof for the case where the power converges to a non-zero constant. We illustrate the practical consequences of our theory on simulated and real data. We conjecture how our methods may be applied to lend insight into other component-count robustification techniques.
1 Reply

Loading