Abstract: Creating a knowledge graph automatically from raw unstructured text has always been a job of domain expert which takes months to curate and refine. In this paper, we propose a domain-independent semi-automatic knowledge graph learning system that can be trained with less amount of data, to identify entities and relations from a large text corpus. The system performs the following tasks to extract knowledge graph from the text: (i) Named Entity Recognition (NER), and (ii) Relation Identification (Open Relation Extraction (OpenRE) and Classification). The system uses deep active learning to calculate confidence scores using maximum normalized log-probability on each prediction for both NER, and relation identification. We experimented with both LSTM and transformer based models for NER and relation identification tasks. We achieved around 88% F1 score for the NER task on OntoNotes-5.0 English data set with 40% training data set and above 83% F1 score for relation identification on TACRED dataset. The OpenRE and relation classification systems were trained on domain-specific datasets. To the best of our knowledge, we are the first to introduce a knowledge graph generation learning system with deep active learning.
0 Replies
Loading