Editing a classifier by rewriting its prediction rulesDownload PDF

May 21, 2021 (edited Oct 27, 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: model debugging, spurious correlations, robustness
  • TL;DR: We propose a method that allows users to rewrite high-level predictions rules with virtually no additional data collection.
  • Abstract: We propose a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our method requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/MadryLab/EditingClassifiers
9 Replies

Loading