Textonomy: A TnT-LLM-Based Approach for Interpretable Topic Modeling at Scale

Textonomy: A TnT-LLM-Based Approach for Interpretable Topic Modeling at Scale

ACL ARR 2026 January Submission3831 Authors

04 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Topic Modeling, TnT-LLM, Textonomy, Content Analysis, Large Language Models, Scalable Machine Learning, Open-Source LLMs

Abstract: Automating text content analysis via topic modeling with Large Language Models (LLM) faces a trilemma: a trade-off between interpretability, scalability, and the accessibility of open-source models. This paper argues for a task-oriented view of topic modeling and introduces Textonomy, an implementation of the two-stage TnT-LLM framework, as a practical solution. Textonomy first uses an LLM to iteratively generate a data-driven taxonomy from a small sample of document summaries. It then trains a lightweight classifier on LLM-generated pseudo-labels for efficient, large-scale inference. We conduct a rigorous evaluation against traditional (LDA), neural (BERTopic), and pure-LLM (TopicGPT) topic models on two distinct datasets: WikiText-103 and a corpus of US Congressional bills. To address reproducibility, we benchmark Textonomy using both proprietary (OpenAI) and open-source (Mistral) LLMs. Results show Textonomy achieves competitive or superior alignment with human-annotated ground-truth clusters while reducing computational costs by over 99% compared to TopicGPT. Our work demonstrates that classification-based frameworks can effectively solve common topic modeling tasks, offering a scalable path to highly interpretable, goal-driven content analysis.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: topic modeling, NLP in resource-constrained setting, human-subject application-grounded evaluations

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Theory

Languages Studied: english

Submission Number: 3831

Loading