Reliability evaluation and an update algorithm for the latent Dirichlet allocation

Jonas Rieger

Published: 2022, Last Modified: 29 Jan 2024undefined 2022Readers: Everyone

Abstract: Modeling text data is becoming increasingly popular. Topic models and in particular the latent Dirichlet allocation (LDA) represent a large field in text data analysis. In this context, the problem exists that running LDA repeatedly on the same data yields different results. This lack of reliability can be improved by repeated modeling and a reasonable choice of a representative. Further, updating existing LDA models with new data is another common challenge. Many dynamic models, when adding new data, also update parameters of past time points, thus do not ensure the temporal consistency of the results. In this cumulative dissertation, I summarize in particular my methodological papers from the two areas of improving the reliability of LDA results and updating LDA results in a temporally consistent manner for use in monitoring scenarios. For this purpose, I first introduce the state of research for each of the two areas. After explaining the idea of the corresponding method, I give examples of applications in which the method has already been used and explain the implementation as an R package. Finally, for both fields I provide an outlook on potential further research.

0 Replies