Turftopic: Topic Modelling with Contextual Representations from Sentence Transformers

Márton Kardos, Kenneth Enevoldsen, Jan Kostkan, Ross Deans Kristensen-McLachlan, Roberta Rocca

Published: 01 Jan 2025, Last Modified: 19 Jan 2026The Journal of Open Source SoftwareEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Topic models are machine learning techniques that are able to discover themes in a set of documents. Turftopic is a topic modelling library including a number of recent developments in topic modelling that go beyond bag-of-words models and can understand text in context, utilizing representations from transformers. Turftopic focuses on ease of use, providing a unified interface for a number of different modern topic models, and boasting both model-specific and model-agnostic interpretation and visualization utilities. While the user is afforded great flexibility in model choice and customization, the library comes with reasonable defaults, so as not to needlessly overwhelm first-time users. In addition, Turftopic allows the user to: a) model topics as they change over time, b) learn topics on-line from a stream of texts, c) find hierarchical structure in topics, d) learning topics in multilingual texts and corpora. Users can utilize the power of large language models (LLMs) to give human-readable names to topics. Turftopic also comes with built-in utilities for generating topic descriptions based on key-phrases or lemmas rather than individual words.
Loading