AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Nicholas Roberts; Xintong Li; Tzu-Heng Huang; Dyah Adila; Spencer Schoenberg; Cheng-Yu Liu; Lauren Pick; Haotian Ma; Aws Albarghouthi; Frederic Sala

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Nicholas Roberts, Xintong Li, Tzu-Heng Huang, Dyah Adila, Spencer Schoenberg, Cheng-Yu Liu, Lauren Pick, Haotian Ma, Aws Albarghouthi, Frederic Sala

Published: 17 Sept 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: weak supervision, automated weak supervision, foundation models, automl, diverse tasks

Abstract: Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings---a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveal the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.

Author Statement: Yes

TL;DR: We introduce AutoWS-Bench-101: a benchmarking framework for automated weak supervision techniques on diverse tasks.

URL: https://github.com/Sala-Group/AutoWS-Bench-101

Dataset Url: https://github.com/Sala-Group/AutoWS-Bench-101

License: Code license: Apache 2.0 Dataset licenses: • MNIST: CC BY-SA 3.0 • CIFAR-10: CC BY 4.0 (on https://www.tensorflow.org/datasets/catalog/cifar10) • Spherical MNIST: MIT • Permuted MNIST: Apache 2.0 • Navier-Stokes: MIT • ECG: ODC-BY 1.0 • EMBER: MIT • YouTube: Apache 2.0 • Yelp: https://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/dc1cabe7cb95/assets/vendor/Dataset_User_Agreement.pdf • IMDb: https://www.imdb.com/conditions?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=3aefe545-f8d3-4562-976a-e5eb47d1bb18&pf_rd_r=FP5VRV15YGQSTMJVFZDM&pf_rd_s=center-1&pf_rd_t=60601&pf_rd_i=interfaces&ref_=fea_mn_lk2

Supplementary Material: zip

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/autows-bench-101-benchmarking-automated-weak/code)

22 Replies

Loading