Identifying and Benchmarking Natural Out-of-Context Prediction Problems

David Madras; Richard Zemel

Identifying and Benchmarking Natural Out-of-Context Prediction Problems

David Madras, Richard Zemel

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: robustness, invariance, spurious correlations, benchmark, out-of-context, domain generalization

TL;DR: We present a suite of naturally-arising "challenge sets" of out-of-context examples from within an existing computer vision benchmark.

Abstract: Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have been recently introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate how rich auxiliary information can be leveraged to identify candidate sets of OOC examples in existing datasets. We present NOOCh: a suite of naturally-occurring "challenge sets", and show how varying notions of context can be used to probe specific OOC failure modes. Experimentally, we explore the tradeoffs between various learning approaches on these challenge sets and demonstrate how the choices made in designing OOC benchmarks can yield varying conclusions.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/dmadras/nooch

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/identifying-and-benchmarking-natural-out-of/code)

16 Replies

Loading