Abstract: Highlights•Evaluation of topic modeling and clustering on health-related tweets and emails.•Topic modeling: LSI, LDA, BTM, GibbsLDA, Online LDA, Online Twitter LDA, and GSDMM.•Clustering: k -means with two feature representations (TF-IDF and Doc2Vec).•The evaluation is based on two internal and five external cluster validity indices.
Loading