Keywords: weak supervision, meta-learning, biomedical relationship extraction, semi-supervised learning, ensemble learning
TL;DR: We propose and apply a meta-learning methodology based on Weak Supervision, for combining Semi-Supervised and Ensemble Learning on the task of Biomedical Relationship Extraction.
Abstract: Natural language understanding research has recently shifted towards complex Machine Learning and Deep Learning algorithms. Such models often outperform their simpler counterparts significantly. However, their performance relies on the availability of large amounts of labeled data, which are rarely available. To tackle this problem, we propose a methodology for extending training datasets to arbitrarily big sizes and training complex, data-hungry models using weak supervision. We apply this methodology on biomedical relation extraction, a task where training datasets are excessively time-consuming and expensive to create, yet has a major impact on downstream applications such as drug discovery. We demonstrate in two small-scale controlled experiments that our method consistently enhances the performance of an LSTM network, with performance improvements comparable to hand-labeled training data. Finally, we discuss the optimal setting for applying weak supervision using this methodology.
Archival Status: Archival
Subject Areas: Machine Learning, Natural Language Processing, Information Extraction, Applications: Biomedicine
9 Replies
Loading