Keywords: Neural topic modeling, Embedded Topic Model (ETM), hard Top-k normalization, masked softmax, decoder support restriction, sparse output distributions
Abstract: Neural topic models are commonly interpreted through short top-word lists, but in embedding-based decoders those lists depend on how topic--word logits are normalized. In the Embedded Topic Model (ETM), the default full-vocabulary softmax couples all words through normalization, so learning signal can be dominated by low-probability tail words that rarely affect topic summaries. We replace the ETM decoder softmax with hard Top-$k$ normalization: for each topic, only the $k$ highest-logit words participate in normalization, with a tiny uniform mixture for numerical safety. This decoder-only swap leaves the encoder, inference network, and training objective unchanged. Across three corpora, hard Top-$k$ yields higher topic quality (the product of NPMI topic coherence and topic diversity), with gains driven primarily by increased diversity. On WikiText-103 with a 20k vocabulary, topic quality improves from 0.122 to 0.212 under the same tuning protocol. A vocabulary sweep up to 30k shows the best Top-$k$ configuration remains above the best softmax baseline at every vocabulary size. Gradient diagnostics are consistent with Top-$k$ concentrating logit updates on the vocabulary head reflected in top-word displays, while full softmax allocates most gradient mass to the tail.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: calibration/uncertainty, data influence, data shortcuts/artifacts, explanation faithfulness, hierarchical & concept explanations, probing, topic modeling
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4737
Loading