Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense ReasoningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Dynamic Self-Supervised Learning, Commonsense Reasoning
Abstract: Commonsense question answering (QA) requires to mine the clues in the context to reason the answer to a question, and is a central task in natural language processing. Despite the advances of current pre-trained models, e.g. BERT, they often learn artifactual causality between the clues in context and the question because of similar but artifactual clues or highly frequent question-clue pairs in training data. To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues. Specifically, DISUSE contains an \textit{erasure sampler} and a \textit{supervisor}. The erasure sampler estimates the correlation scores between all clues and the question in an attention manner, and then erases each clue (object in image or word in question and context) according to the probability which inversely depends on its correlation score. In this way, the redundant and artifactual clues to the current question are removed, while necessary and important clues are preserved. Then the supervisor evaluates current erasure performance by inspecting whether the erased sample and its corresponding vanilla sample have consistent answer prediction distribution, and supervises the KL divergence between these two answer prediction distributions to progressively improve erasure quality in a self-supervised manner. As a result, DISUSE can learn and establish more precise corresponding question-clue pairs, and thus gives more precise answers of new questions in present of their contexts via reasoning the key and correct corresponding clues to the questions. Extensive experiment results on the RC dataset (ReClor) and VQA datasets (GQA and VQA 2.0) demonstrate the superiority of our DISUSE over the state-of-the-arts.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=V5J_a93NDE
5 Replies

Loading