A convergence diagnostic for Bayesian clustering

Martin Lysy, Masoud Asgharian, Vahid Partovi Nia

Published: 12 Jun 2019, Last Modified: 06 Apr 2026WIREs Computational StatisticsEveryonearXiv.org perpetual, non-exclusive license

Abstract: In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design tran- sition kernels to explore this state space efficiently, MCMC convergence diagnostics for clustering applications is especially important. For general MCMC problems, state-of-the-art convergence diagnostics involve compar- isons across multiple chains. However, single-chain alternatives can be appealing for computationally intensive and slowly-mixing MCMC, as is typically the case for Bayesian clustering. Thus, we propose here a single-chain convergence diagnostic specifically tailored to discrete-space MCMC. Namely, we consider a Hotelling-type statistic on the highest probability states, and use regenerative sampling theory to derive its equilibrium distri- bution. By leveraging information from the unnormalized posterior, our diagnostic protects against seemingly convergent chains in which the relative frequency of visited states is incorrect. The methodology is illustrated with a Bayesian clustering analysis of genetic mutants of the flowering plant Arabidopsis thaliana. Keywords: Bayesian clustering, Markov chain Monte Carlo, convergence diagnostic, Hotelling statistic, regener- ative sampling.