Learning Hyper Label Model for Programmatic Weak Supervision

Renzhi Wu; Shen-En Chen; Jieyu Zhang; Xu Chu

Learning Hyper Label Model for Programmatic Weak Supervision

Renzhi Wu, Shen-En Chen, Jieyu Zhang, Xu Chu

Published: 01 Feb 2023, Last Modified: 18 Feb 2023ICLR 2023 posterReaders: Everyone

Keywords: Programmatic Weak Supervision, Data Programming, Label Model

Abstract: To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training labels. Most existing label models require a parameter learning step for each dataset. In this work, we present a hyper label model that (once learned) infers the ground-truth labels for each dataset in a single forward pass without dataset-specific parameter learning. The hyper label model approximates an optimal analytical (yet computationally intractable) solution of the ground-truth labels. We train the model on synthetic data generated in the way that ensures the model approximates the analytical optimal solution, and build the model upon Graph Neural Network (GNN) to ensure the model prediction being invariant (or equivariant) to the permutation of LFs (or data points). On 14 real-world datasets, our hyper label model outperforms the best existing methods in both accuracy (by 1.4 points on average) and efficiency (by six times on average). Our code is available at https://github.com/wurenzhi/hyper_label_model

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)

TL;DR: A hyper label model to aggregate weak labels from multiple weak supervision sources to infer the ground-truth labels in a single forward pass

Supplementary Material: zip

15 Replies

Loading