Abstract: The initial purpose of topic models was to identify latent topical clusters within unstructured text. Meanwhile, the focus of advanced studies has changed primarily to estimating the relationship between the discovered topical structure and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion. In the Structural Topic Model (STM;Roberts et al., 2016), for instance, multiple repeated linear regressions of sampled topic proportions on metadata covariates are performed. This is done by using a Monte Carlo sampling technique known as the \textit{method of composition}. In this paper, we propose two modifications of this approach: First, we implement a substantial correction to the model by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach instead. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts.
Paper Type: long
0 Replies
Loading