Systematic review of effect of data augmentation using paraphrasing on Named entity recognition

Saket Sharma; Aviral Joshi; Namrata Mukhija; Yiyun Zhao; Hanoz Bhathena; Prateek Singh; Sashank Santhanam; Pritam Biswas

Systematic review of effect of data augmentation using paraphrasing on Named entity recognition

Saket Sharma, Aviral Joshi, Namrata Mukhija, Yiyun Zhao, Hanoz Bhathena, Prateek Singh, Sashank Santhanam, Pritam Biswas

03 Oct 2022 (modified: 05 May 2023)Neurips 2022 SyntheticData4MLReaders: Everyone

Keywords: Data augmentation, paraphrasing, gpt-3, named entity recognition, nlp

TL;DR: Paraphrasing can help NER based on choice of paraphraser. Effect varies with dataset size.

Abstract: While paraphrasing is a promising approach for data augmentation in classification tasks, its effect on named entity recognition (NER) is not investigated systematically due to the difficulty of span-level label preservation. In this paper, we utilize simple strategies to annotate entity spans in generations and compare established and novel methods of paraphrasing in NLP such as back translation, specialized encoder decoder models such as Pegasus, and GPT-3 variants for their effectiveness in improving downstream performance for NER across different levels of gold annotations and paraphrasing strength on 5 datasets. We also analyze the quality of generated paraphrases based on their entity preservation and paraphrasing language quality. We find that the choice of the paraphraser greatly impacts NER performance, with one of the larger GPT-3 variants exceedingly capable at generating high quality paraphrases, improving performance in most cases, and not hurting others, while other paraphrasers show more mixed results. We also find inline auto annotations generated by larger GPT-3 to be strictly better than heuristic based annotations. We find diminishing benefits of paraphrasing as gold annotations increase for most datasets. While larger GPT-3 variants perform well by both entity preservation and human evaluation of language quality, those two metrics do not necessarily correlate with downstream performance for other paraphrasers.

4 Replies

Loading