Predicting generalization performance with correctness discriminators

Published: 01 Jan 2024, Last Modified: 04 Nov 2025EMNLP (Findings) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The ability to predict an NLP model’s accuracy on unseen, potentially out-of-distribution data is a prerequisite for trustworthiness. We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We achieve this by training a *discriminator* which predicts whether the output of a given sequence-to-sequence model is correct or not. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds, and that these bounds are remarkably close together.
Loading