Reject and Cascade Classifier with Subgroup Discovery for Interpretable Metagenomic Signatures

Published: 2021, Last Modified: 13 Nov 2024PKDD/ECML Workshops (1) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Over the past decade, technological advances have made high-speed, high-resolution sequencing of genetic material possible at ever lower cost (from millions to one hundred dollars). In this context, the human microbiome has demonstrated its ability to support the stratification and the classification of various human diseases. Thus, the gut microbiota is set to play a key role in precision medicine as a “super-integrator” of patient status. Identifying metagenomic signatures is becoming increasingly important in precision medicine. To address the interpretability/accuracy trade off, we propose a hybrid approach based on a cascade classifier combining a first step of Subgroup Discovery (for interpretability) and then a classifier model (for accuracy). With this approach, different interpretable signatures stratify the maximum possible number of patients while those remaining are defined by a default non-interpretable signature. Several datasets from the NCBI public repository on different diseases (colorectal cancer, cirrhosis, diabetes, obesity) have been used to evaluate the interest of our approach to build both accurate and interpretable metagenomic diseases signatures. The results show that the approach reaches comparable or superior performances to the state-of-the-art approaches while offering better interpretability than black box.
Loading