Delving Deep into Extractive Question Answering DataDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: The impact of large-scale pre-trained language models on Question Answering in recent times is undeniably positive.Few prior works have attempted however to provide detailed insight into how such models learn from QA dataset component parts.For example, what specific kinds of examples are most important for models to learn from? In this paper, we examine two English QA datasets, namely SQuAD1.1 and NewsQA, and report findings on the internal characteristics of these widely employed extractive QA datasets. Experiment results reveal: (i) Models learn relatively independently of examples from outside a given question type (the performance on each question type mainly comes from that data belonging to that same question type); (ii) Increased difficulty in the training data results in better performance; (iii) Learning from QA data approximates to the process of learning question-answer matches.
0 Replies

Loading