Relabeling Minimal Training Subset to Flip a Prediction

Anonymous

Relabeling Minimal Training Subset to Flip a Prediction

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: To flip the prediction on a test point $x_t$, how to identify the smallest training subset $\mathcal{S}_t$ we need to \textbf{relabel}? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 2\% of the training points can always flip a prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by altering training points; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Position papers

Languages Studied: English

0 Replies

Loading