Beyond Accuracy: Revisiting Out-of-Distribution Generalization in NLI Models

Published: 24 May 2025, Last Modified: 18 Jun 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: OOD Generalization, NLI, Transformers, Linear Separability
Abstract: This study investigates how well discriminative transformers generalize in Natural Language Inference (NLI) tasks. We specifically focus on a well-studied bias in this task: the tendency of models to rely on superficial features and dataset biases rather than a true understanding of language. We argue that the performance differences observed between training and analysis datasets do not necessarily indicate a lack of knowledge within the model. Instead, the gap often points to a misalignment between the decision boundaries of the classifier head and the representations learned by the encoder for the analysis samples. By investigating the representation space of NLI models across different analysis datasets, we demonstrate that even when the accuracy is nearly random in some settings, still samples from opposing classes remain almost perfectly linearly separable in the encoder's representation space. This suggests that, although the classifier head may fail on analysis data, the encoder still generalizes and encodes representations that allow for effective discrimination between NLI classes.
Copyright Agreement: pdf
Submission Number: 206
Loading