dcbench: a benchmark for data-centric AI systems

Sabri Eyuboglu, Bojan Karlas, Christopher Ré, Ce Zhang, James Zou

Published: 2022, Last Modified: 12 May 2023DEEM@SIGMOD 2022Readers: Everyone

Abstract: The development workflow for today's AI applications has grown far beyond the standard model training task. This workflow typically consists of various data and model management tasks. It includes a "data cycle" aimed at producing high-quality training data, and a "model cycle" aimed at managing trained models on their way to production. This broadened workflow has opened a space for already emerging tools and systems for AI development. However, as a research community, we are still missing standardized ways to evaluate these tools and systems. In a humble effort to get this wheel turning, we developed dcbench, a benchmark for evaluating systems for data-centric AI development. In this report, we present the main ideas behind dcbench, some benchmark tasks that we included in the initial release, and a short summary of its implementation.

0 Replies