AI and the Everything in the Whole Wide World Benchmark

Inioluwa Deborah Raji; Emily Denton; Emily M. Bender; Alex Hanna; Amandalynne Paullada

AI and the Everything in the Whole Wide World Benchmark

Inioluwa Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, Amandalynne Paullada

Published: 11 Oct 2021, Last Modified: 04 May 2025NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: evaluation, dataset, benchmark

TL;DR: The benchmarking paradigm in machine learning is incompatible with claims to performance on underspecified general tasks

Abstract: There is a tendency across different subfields in AI to see value in a small collection of influential benchmarks, which we term 'general' benchmarks. These benchmarks operate as stand-ins or abstractions for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore how such benchmarks are designed, constructed and used in order to reveal key limitations of their framing as the functionally 'general' broad measures of progress they are set up to be.

Supplementary Material: pdf

Contribution Process Agreement: Yes

Author Statement: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/ai-and-the-everything-in-the-whole-wide-world/code)

13 Replies

Loading