Bag of region embeddings via local context unit for text classification


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Contextual information and word orders are proved valuable for text classification task. To make use of local word order information, n-grams are commonly used features in several models, such as linear models. However, these models commonly suffer the data sparsity problem and are difficult to represent large size region. The discrete or distributed representations of n-grams can be regarded as region embeddings, which are representations of fixed size regions. In this paper, we propose two novel text classification models that learn task specific region embeddings without hand crafted features, hence the drawbacks of n-grams can be overcome. In our model, each word has two attributes, a commonly used word embedding, and an additional local context unit which is used to interact with the word's local context. Both the units and word embeddings are used to generate representations of regions, and are learned as model parameters. Finally, bag of region embeddings of a document is fed to a linear classifier. Experimental results show that our proposed methods achieve state-of-the-art performance on several benchmark datasets. We provide visualizations and analysis illustrating that our proposed local context unit can capture the syntactic and semantic information.
  • Keywords: region embedding, local context unit, text classification