Abstract: Synonymous Named Entity Discovery (SNED) refers to the task of discovering named entities that refer to the same entity. Discovering synonymous named entity by manually designing features and similarity metrics is non-trivial and very difficult due to the diversity of the raw features (e.g. the associated attributes and text content). In this paper, we present Content-Aware Attributed Entity Embedding (CAAEE), an unsupervised SNED model to address this issue. By leveraging the associated attributes and text content information, our approach learns a projection which maps named entities to a low-dimensional feature space without any manually designed feature and supervised information. In the learned feature space, synonymous named entities are close to each other, which can reflect the similarity between named entities. We build two heterogeneous networks to jointly model named entities, their associated attributes and text content information. For each heterogeneous network, we design two objective function based on two probability distributions aimed at preserving the network structure. By jointly optimizing the objective functions, a low-dimensional representation is obtained for each named entity. The similarity between the learned low-dimensional representations is then used to discover synonymous named entities. In experiments, we compare our model with existing SNED models on two real-world named entity datasets. Experimental results show that CAAEE outperforms state-of-the-art methods with significant improvement.
Loading