Learning Topics Using Semantic Locality

Ziyi Zhao, Krittaphat Pugdeethosapol, Sheng Lin, Zhe Li, Yanzhi Wang, Qinru Qiu

Feb 11, 2018 (modified: Feb 11, 2018) ICLR 2018 Workshop Submission readers: everyone
  • Abstract: The topic modeling discovers the latent topic probability of the given text documents. To generate the more meaningful topic that better represents the given document, we proposed a new feature selection technique which can be used in the data preprocessing stage. The method consists of three steps. First, it generates the word/word-pair from every single document (Feature generation). Second, it applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering (Feature filtering). Third, it uses the K-means algorithm to merge the word pairs that have the similar semantic meaning (Feature coalescence). Our proposed technique can improve the generated topic accuracy by up to 12.99%.