Abstract: In this paper, we demonstrate that the performance of natural language inference (NLI) models can be enhanced using a novel adversarial approach, in which large language models (LLMs) are used to systematically address NLI models’ weaknesses. We first employ the LLMs to adversarially generate challenging NLI examples, looking for instances that are misclassified by the NLI model, effectively creating a dataset. These examples are validated by an ensemble of LLMs to ensure their correctness and are subsequently used to retrain the NLI model, iteratively refining its performance. In our evaluation, the proposed approach demonstrated substantial accuracy improvements on multiple datasets, including 1.43% on the SNLI dataset, 2.75% on the ANLI dataset, and 4.29% on the MultiNLI dataset. Our evaluation highlights the utility of LLMs in adversarial model improvement, providing a pathway toward robust and human-independent enhancements for NLI systems. Additionally, our LLM-based approach can also be used to automate the creation of NLI datasets.
Paper Type: Short
Research Area: Dialogue and Interactive Systems
Research Area Keywords: automatic evaluation, few-shot generation, analysis
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 1147
Loading