Abstract: Learning low dimensional representations from a large number of short corpora has a profound practical significance but with vital challenge in content analysis and data mining applications. In this paper, we propose a novel topic model called Block Bayesian Sparse Topic Coding (Block-BSTC), which is capable of discovering the latent semantic representation of short texts. The Block-BSTC relaxes the normalization constraint of the inferred representations with word embeddings and block sparse Bayesian learning, which is convenient to directly control the sparsity of word codes with exploiting the intra-block correlations. Furthermore, the experimental results show that Block-BSTC achieves great performance on the sparsity ratio of word codes. Meanwhile, it can improve the accuracy of document classification.
0 Replies
Loading