Track: Tiny Paper Track
Keywords: knowledge graph, academic articles, diet, NCD, augmentation
TL;DR: Using a Knowledge Graph extracted from literature to imporve classification performance on medical datasets.
Abstract: We present a novel approach to augmenting medical and biological prediction tasks with knowledge derived from literature. Specifically, while typical modern medical and biological datasets may contain a large amounts of biomarker and genetic data features per subject, the number of subjects often remains limited. In addition, while many of the collected features are relatively easy to measure, individually such features are typically not strongly informative with regards to higher order prediction tasks of interest. This small sample data setting thus limits and complicates the applicability of standard machine learning prediction methods due to possible issues of overfitting.
At the same time, decades of medical research have produced extensive knowledge in the form of documented associations between various biological entities. Here we propose a framework for integrating this evidence-based knowledge into predictive models, addressing several challenges in the use of qualitative literature findings to obtain more informative representations of quantitative data.
The stages of the approach include: a construction of a Knowledge Graph by extracting entity relationships from the literature, a construction of a probability model consistent with the relationships, and the use of the model for improved predictions via feature augmentation and sparsity. Our initial evaluation results demonstrate improved prediction accuracy on biomarkers in the NutriTech dataset.
Attendance: Mark Kozdoba
Submission Number: 33
Loading