Interactive Model Correction with Natural Language

Published: 27 Oct 2023, Last Modified: 25 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX
TL;DR: A natural language interface for giving feedback on model errors from spurious correlations; works at ImageNet scale.
Abstract: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on spurious correlations that fail to generalize to new data distributions, such as a bird classifier that relies on the background of an image. Preventing models from latching on to spurious correlations necessarily requires additional information beyond labeled data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that far less supervision suffices if we provide targeted feedback about the misconceptions of models trained on a given dataset. We introduce Clarify, a novel natural language interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns, such as "water background" for a bird classifier. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our empirical results show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 7.3% in two datasets with spurious correlations. Finally, we use Clarify to find and rectify 31 novel spurious correlations in ImageNet, improving minority-split accuracy from 21.1% to 28.7%.
Submission Track: Full Paper Track
Application Domain: Computer Vision
Survey Question 1: We developed an interface that allows users to correct machine learning models by simply discribing the model's errors in natural language. Explainability plays a crucial role in our work, as the user-provided descriptions clarify the model's weaknesses, allowing for targeted improvements.
Survey Question 2: Traditional model suffer from spurious correlations. To detect such model misconceptions using only validation data, we look for failures that are easy to explain with natural language.
Survey Question 3: Our work employs a custom web interface based on natural language descriptions for explainability.
Submission Number: 86