Keywords: Weak Supervision, Active Learning, Fuzzy logic, AI in Healthcare
Abstract: Supervised machine learning (ML) has fueled major advances in several domains such as health, education and governance. However, most modern ML methods rely on vast quantities of point-by-point hand-labeled training data. In domains such as clinical research, where data collection and its careful characterization is particularly expensive and tedious, this reliance on pointillisticaly labeled data is one of the biggest roadblocks to the adoption of modern data-hungry ML algorithms. Data programming, a framework for learning from weak supervision, attempts to overcome this bottleneck by generating probabilistic training labels from simple yet imperfect heuristics obtained a priori from domain experts. We present WARM, Active Refinement of Weakly Supervised Models, a principled approach to iterative and interactive improvement of weakly supervised models via active learning. WARM directs domain experts' attention on a few selected data points that, when annotated, would improve the label model's probabilistic output in terms of accuracy the most. Gradient backpropagation is then used to iteratively update decision parameters of the heuristics of the label model. Experiments on multiple real-world medical classification datasets reveal that WARM can substantially improve the accuracy of probabilistic labels, a direct measure of training data quality, with as few as 30 queries to clinicians. Additional experiments with domain shift and artificial noise in the LFs, demonstrate WARM's ability to adapt heuristics and the end model to changing population characteristics as well as its robustness to mis-specification of domain-expert-acquired LFs. These capabilities make WARM a potentially useful tool for deploying, maintaining, and auditing weakly supervised systems in practice.
One-sentence Summary: We present WARM, Active Refinement of Weakly Supervised Models, a principled approach to iterative and interactive improvement of weakly supervised models via active learning.
14 Replies
Loading