Abstract: Author summary Machine learning models have proven to be successful at predicting diseases and other human phenotypes from microbiome data; however, gaining insight from such complex models is often challenging. To this end, we developed endoR, an R-package for enhanced interpretation of tree ensemble models (e.g., random forests), the most popular and highest-performing machine learning models applied to microbiome data to date. Our method simplifies models and extracts information on associations between microbiome data, host metadata and covariates, and a predicted trait (e.g., disease versus healthy). endoR has two main strengths: i) the ability to capture interactions between predictors, and ii) regularization steps that avoid overfitting. Through extensive validations, we show that endoR is comparable in accuracy to other common approaches while easing and enhancing model interpretation. We applied endoR to gain insight into a complex syntrophic network of human gut methanogens and bacterial fermenters. Overall, endoR is a powerful tool for gaining insight from tree ensemble models applied to microbiome data.
0 Replies
Loading