Encoding Expert Knowledge into Federated Learning using Weak Supervision

Sebastian Caldas; Mononito Goswami; Artur Dubrawski

Encoding Expert Knowledge into Federated Learning using Weak Supervision

Sebastian Caldas, Mononito Goswami, Artur Dubrawski

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Federated Learning, Weak Supervision, Sequential Decision Making, Time-series

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We describe a method to capture expert knowledge via data annotations on on-device distributed data.

Abstract: Learning from on-device data has enabled intelligent mobile applications ranging from smart keyboards to apps that predict abnormal heartbeats. However, due to the sensitive and distributed nature of such data, it is onerous to acquire the expert annotations required to train traditional supervised machine learning pipelines. Consequently, existing federated learning techniques that learn from on-device data mostly rely on unsupervised approaches, and are unable to capture expert knowledge via data annotations. In this work, we explore how to codify this expert knowledge using programmatic weak supervision, a principled framework that leverages labeling functions (i.e., heuristic rules) in order to annotate vast quantities of data without direct access to the data itself. We introduce Weak Supervision Heuristics for Federated Learning (WSHFL), a method that interactively mines and leverages labeling functions to annotate on-device data in cross-device federated settings. We conduct experiments across two data modalities: text and time-series, and demonstrate that WSHFL achieves competitive performance compared to fully supervised baselines without the need for direct data annotations.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 264

Loading