S$^2$ynRE: Two-stage Self-training with Synthetic data for Low-resource Relation ExtractionDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: relation extraction, data synthesis, large language model
Abstract: Current relation extraction methods suffer from the inadequacy of large-scale annotated data. While distant supervision alleviates the problem of data quantities, there still exists domain disparity in data qualities due to its reliance on domain-restrained knowledge bases. In this work, we propose S$^2$ynRE, a framework of two-stage Self-training with Synthetic data for Relation Extraction. We first leverage the capability of large language models to adapt to the target domain and automatically synthesize large quantities of coherent, realistic training data. We then propose an accompanied two-stage self-training algorithm that iteratively and alternately learns from synthetic and golden data together. We conduct comprehensive experiments and detailed ablations on popular relation extraction datasets to demonstrate the effectiveness of the proposed framework. Specifically under low resource settings, S$^2$ynRE brings up to 17.18% absolute improvements and 12.63% on average across all datasets.
Paper Type: long
Research Area: Information Extraction
0 Replies

Loading