Improving Data Augmentation in Low-resource Question Answering with Active Learning in Multiple StagesDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation and active learning to improve performance in low resource settings, where the target domain is vastly different from the source domain. Furthermore, we investigate data augmentation via generation for question answering in three different low-resource settings relevant in practice and how this can be improved: 1) No labels for the target domain, 2) static, labelled data for the target domain and 3) an Active Learning approach with labels for the target domain provided by an expert. In all settings we assume sufficient amount of labelled data from the source domain is available. We perform extensive experiments in each of the above conditions. Our findings show that our novel approach, which combines data augmentation with active learning, boosts performances in the low-resource, domain-specific setting, allowing for low-labelling-effort question answering systems in new, specialized domains. They further demonstrate how to best utilize data augmentation to boost performance in these settings.
Paper Type: long
0 Replies

Loading