Original Pdf: pdf
Code: [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Byg1v1HKDB)
Data: [ART Dataset](https://paperswithcode.com/dataset/art-dataset), [ATOMIC](https://paperswithcode.com/dataset/atomic), [HellaSwag](https://paperswithcode.com/dataset/hellaswag), [ROCStories](https://paperswithcode.com/dataset/rocstories), [WSC](https://paperswithcode.com/dataset/wsc), [WinoGrande](https://paperswithcode.com/dataset/winogrande)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:1908.05739/code)
Keywords: Abductive Reasoning, Commonsense Reasoning, Natural Language Inference, Natural Language Generation
Abstract: Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform—despite their strong performance on the related but more narrowly defined task of entailment NLI—pointing to interesting avenues for future research.