Overcoming Annotation Scarcity for Shallow Semantic Parsing in Scientific Procedural Text


Nov 17, 2018 AKBC 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Keywords: shallow semantic parsing, event extraction, procedural text, weakly supervised, n-ary relations
  • TL;DR: Weakly supervised method for unlabelled shallow semantic structures and an associated dataset of materials science procedural text.
  • Abstract: Materials science literature contains millions of synthesis routes described in unstructured natural language text. The large scale mining of these synthesis procedures promises to allow a deeper scientific understanding of materials synthesis and the automated planning of synthesis procedures. This however requires the construction of knowledge bases of synthesis procedures from natural language text. A major bottleneck in extraction of these structured synthesis representations from text is the lack of labeled data on which to train or evaluate extraction models. To address this bottleneck, we introduce a dataset of 230 synthesis procedures annotated with the labeled graph structures which express the semantics of the synthesis sentences. The nodes are operations and arguments in the synthesis, while labeled edges specify relations between the nodes. Next, we describe a novel weakly supervised approach to the extraction of unlabeled graph structures from synthesis sentences. The proposed model is framed as a matrix completion model parameterized by a DeepSet neural network \cite{DeepSets2017}. The proposed model outperforms a strong heuristic baseline by 4 points precision and 2 points F1.
  • Archival status: Archival
  • Subject areas: Machine Learning, Natural Language Processing, Information Extraction, Knowledge Representation, Applications: Science
0 Replies