Adversarial Winograd Schema Challenge

ACL ARR 2024 June Submission5173 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: While Large Language Models (LLMs) have showcased remarkable proficiency in reasoning, there is still a concern about hallucinations and unreliable reasoning issues due to semantic associations and superficial logical chains. To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Adversarial Winograd Schema Challenge (AWSC), based on the famous Winograd Schema Challenge (WSC) dataset. By simply replacing the entities with those that are more associated with the wrong answer, we find that the performance of LLMs drops significantly despite the rationale of reasoning remaining the same. Furthermore, we propose Abstraction-of-Thought (AoT), a novel prompt method for recovering adversarial cases to normal cases to improve LLMs' robustness and consistency in reasoning, as demonstrated by experiments on AWSC.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Question Answering, Resources and Evaluation, Semantics: Lexical and Sentence-Level
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 5173
Loading