Editing a classifier by rewriting its prediction rules

Shibani Santurkar; Dimitris Tsipras; Mahalaxmi Elango; David Bau; Antonio Torralba; Aleksander Madry

Editing a classifier by rewriting its prediction rules

Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry

Published: 09 Nov 2021, Last Modified: 04 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: model debugging, spurious correlations, robustness

TL;DR: We propose a method that allows users to rewrite high-level predictions rules with virtually no additional data collection.

Abstract: We propose a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our method requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/MadryLab/EditingClassifiers

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/editing-a-classifier-by-rewriting-its/code)

9 Replies

Loading