Keywords: diagnostics, adaptive immune receptor repertoires, generalization, robustness, causal inference, biomarkers
TL;DR: Causal modeling can improve generalization of biomarkers in high-dimensional domains, and adaptive immune receptor repertoires represent an interesting testbed of this approach.
Abstract: Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we discuss building a diagnostic based on a specific, recently established high-dimensional biomarker – adaptive immune receptor repertoires (AIRRs), and investigate how causal modeling may improve the robustness and generalization of developed diagnostics. We examine how the main biological and experimental factors of the AIRR domain may influence the learned biomarkers, especially in the presence of dataset shifts, and provide simulations of such effects. We conclude that causal modeling could improve AIRR-based diagnostics, but also that causal modeling itself might find a powerful testbed with complex, high-dimensional variables in the AIRR field.