Textonomy: A TnT-LLM-Based Approach for Interpretable Topic Modeling at Scale

Textonomy: A TnT-LLM-Based Approach for Interpretable Topic Modeling at Scale

ACL ARR 2025 May Submission3968 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automating text content analysis, particularly topic modeling, faces challenges in topic interpretability, evaluation, and scalability. This paper introduces Textonomy, a novel method based on the TnT-LLM framework, designed to address these challenges. Textonomy operates in two phases: first, it iteratively generates and refines a taxonomy using Large Language Models (LLMs) on batches of summaries, guided by a user-defined use case. Second, it pseudo-labels a subset of texts with this taxonomy via LLM-based zero-shot classification and trains a lightweight classifier for large-scale inference. We evaluate Textonomy against traditional (LDA, BERTopic) and recent LLM-based (TopicGPT) topic models on the WikiText-103 dataset. Results show Textonomy achieves competitive or superior performance in aligning with human-annotated ground-truth clusters (e.g., average ARI of 0.68 vs. 0.58 for TopicGPT) and demonstrates high stability. Specifically, Textonomy reduces the computational cost and time by approximately 99.4\% and 98.5\%, respectively, compared to TopicGPT. These findings highlight Textonomy's potential for robust, interpretable, and efficient topic modeling on large corpora.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: topic modeling, NLP in resource-constrained setting, human-subject application-grounded evaluations

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Theory

Languages Studied: english

Keywords: topic modeling, NLP in resource-constrained setting, human-subject application-grounded evaluations

Submission Number: 3968

Loading