Keywords: Relational Triple Extraction, Natural Language Processing
Abstract: The ability to extract entities and their relations from unstructured text is essential for automated maintenance of large-scale knowledge graphs. To keep a knowledge graph up-to-date, it is required of an extractor to possess not only the ability to recall the triples encountered during training, but also the triples it has never seen before. In this paper, we show that although existing extraction models are able to memorize and recall already seen triples, they cannot generalize effectively for unseen triples. This alarming observation was previously unknown due to the composition of the test sets of the go-to benchmark datasets, which turns out to contain only 2\% unseen data, rendering them incapable to measure the generalization performance. To combat memorization and promote generalization, we present a simple yet effective noising framework that can be combined with existing models. By carefully noising the entities and their surrounding context, we refrain the model from simply memorizing the entities and their context, and promote generalization. To properly evaluate the generalization performance, we propose test set augmentation and train set sifting to emphasize unseen data. Experiments show that our model not only outperforms the current state-of-the-art in terms of generalization on the newly augmented unseen test data, but is also able to retain its memorization capabilities - achieving competitive results on the standard test data.