LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Mutsumi Nakamura; Santosh Mashetty; Mihir Parmar; Neeraj Varshney; Chitta Baral

LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, Chitta Baral

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Resources and Evaluation

Submission Track 2: Semantics: Lexical, Sentence level, Document Level, Textual Inference, etc.

Keywords: Logical Reasoning, Large Language Models, Natural Language Inference, Adversarial Attacks

TL;DR: We propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance.

Abstract: Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ~53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks. Data and source code are available at https://github.com/msantoshmadhav/LogicAttack.

Submission Number: 5023

Loading