- Keywords: Bayesian methods, auxiliary variable methods, variational inference, conjugacy, truncated normal distribution, binary probit, categorical data
- TL;DR: We provide a "user's guide" to applying a conjugate binary model to categorical data via easy variational inference justified by principled bounds.
- Abstract: In pursuit of tractable Bayesian analysis of categorical data, auxiliary variable methods hold promise, but impose asymmetries on the truly unordered categories or spoil scalability via strong dependencies in posteriors over parameters. The Diagonal Orthant Probit (DO-Probit) model proposed by Johndrow, Lum, and Dunson (AISTATS 2013) avoids these difficulties, treating all categories symmetrically while yielding tractable conditionally conjugate inference. However, we show that the intended DO-Probit likelihood for categorical observations, when paired with a normal prior, does not yield a conjugate posterior. Instead, we clarify that their posterior analysis is only correct for a different model that treats observations as multiple independent binary draws. This raises two questions: Other than tractability, what justifies the binary model for categorical data? And how should a binary model make categorical predictions? To resolve these issues, using variational methods we obtain a lower bound of a categorical model's marginal likelihood that can be optimized by fitting the conjugate binary model. Optimizing this bound enjoys all benefits advocated in the original DO-Probit work. We further extend this fast, reliable covariate-informed modeling of categorical outcomes to groups or sequences of data related in a hierarchy.