Abstract: Question Answering (QA) is widely used for evaluating the reasoning capabilities of NLP systems, where an important ability is to decide on answerability — whether the question can be answered with the information at hand. Previous works have studied answerability by including a fixed proportion of unanswerable questions in a collection, without explaining the reasons for such proportion or its impact on systems’ results. In this work, we study different scenarios for answerability detection and evaluate several Large Language Models using different rates of unanswerable questions by introducing unanswerable questions in the popular multiple-choice QA dataset RACE. We show that a 30\% rate of unanswerable questions at training seems optimal across a variety of scenarios, and support this with a series of extended experiments. Despite this, we observe that systems tend to expect the same rate of unanswerable questions seen at training and that the ability to decide on answerability always comes at the expense of the ability to find the answer when it exists.
Paper Type: long
Research Area: Question Answering
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading