Abstract: Corpus-based set expansion refers to mining "sibling" entities of some given seed entities from a corpus. Previous works are limited to using either textual context matching or semantic matching to fulfill this task. Neither matching method takes full advantage of the rich information in free text. We present CaSE, an efficient unsupervised corpus-based set expansion framework that leverages lexical features as well as distributed representations of entities for the set expansion task. Experiments show that CaSE outperforms state-of-the-art set expansion algorithms in terms of expansion accuracy.
Loading