TL;DR: A Comprehensive Framework for Evaluating and Enhancing Language Model Robustness for Multiple Input Perturbations.
Abstract: Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations, causing concerns about trust. To enhance trust, it is imperative to gain a comprehensive understanding of the model's failure modes and develop effective strategies to improve their performance. In this study, we introduce a framework designed to examine how input perturbations affect language models across various scales, including pre-trained models and large language models (LLMs). Utilizing fine-tuning, we enhance the model's robustness to input perturbations. Additionally, we investigate whether exposure to one perturbation enhances or diminishes the model's performance with respect to other perturbations. To address robustness against multiple perturbations, we present three distinct fine-tuning strategies. We also extend the applicability of our framework to LLMs through a chain of thought (CoT) prompting approach with exemplars. Our framework is applied to the Tabular-NLI task, demonstrating that the proposed strategies effectively train the model to handle various perturbations without compromising accuracy on an original set.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies
Loading