Know your tools well: Better $\textit{and}$ faster QA with synthetic examplesDownload PDF

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone
Abstract: Synthetic training data---commonly used to augment human-labeled examples in supervised learning---are often noisy, but can be generated in very large quantities and diversity. This paper proposes to leverage these unique attributes in a targeted manner to maximize the utility of synthetic examples. Via two novel applications that utilize synthetic data for targeted pre-training and knowledge distillation, we demonstrate the feasibility of this idea for machine reading comprehension (MRC). Using our proposed methods, we are able to train simultaneously $\textbf{\textit{smaller}}$, $\textbf{\textit{faster}}$ and $\textbf{\textit{more accurate}}$ MRC models than existing synthetic augmentation methods. Our methods are generic in nature and can be applied to any task for which synthetic data can be generated.
0 Replies

Loading