Keywords: Natural Language Inference, Benchmark Construction
Abstract: Natural Language Inference (NLI) is a core task for language understanding, yet existing NLI datasets are static and no longer challenging, allowing current Large Language Models (LLMs) to perform well without truly revealing their capabilities and shortcomings. To address this problem, we propose a new data augmentation framework to automatically build more challenging NLI datasets based on existing datasets, by iteratively fusing rich facts into the premise and hypothesis of an NLI instance. We use a strict fact filter to ensure that fused facts are non-contradictory and non-redundant. Applied to SNLI and MNLI, our augmentation substantially increases data length and complexity, and the performance of a range of LLMs on the augmented datasets drops significantly (up to 30%). Ablation experiments and human quality checks confirm the high quality of the generated data.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 10938
Loading