LymphoML: An interpretable artificial intelligence-based method identifies morphologic features that correlate with lymphoma subtype

Vivek Shankar, Xiaoli Yang, Vrishab Krishna, Brent Tan, Oscar Silva, Rebecca Rojansky, Andrew Y. Ng, Fabiola Valvert, Edward Briercheck, David Weinstock, Yasodha Natkunam, Sebastian Fernandez-Pol, Pranav Rajpurkar

Published: 10 Dec 2023, Last Modified: 11 Sept 2024Machine Learning for Health (ML4H)EveryoneCC BY 4.0

Abstract: Accurate lymphoma classification is challenged by morphological diversity. We develop LymphoML - an interpretable machine learning approach for lymphoma subtyping into eight diagnostic categories. LymphoML introduces a pipeline to process H&E-stained TMA cores, segment nuclei and cells, compute features encompassing morphology, texture, and architecture, and apply gradient-boosted models to make diagnostic predictions. We find that LymphoML’s interpretable approach provides superior diagnostic yield compared to black box deep-learning when applied to a dataset of TMAs without patch-level annotations of lymphoma and limited tissue volume for specific categories. Using SHapley Additive exPlanation (SHAP) analysis, we assess the impact of each feature group on model prediction and find that nuclear shape features are most discriminative. We analyze the most impactful nuclear shape features (e.g. minor axis length) and find that they are most discriminative for DLBCL (F1-score: 78.7%), and classic Hodgkin lymphoma (F1-score: 74.5%). Our work represents the first assessment of the size and shape differences between DLBCL and non-DLBCL. Augmenting our model with a standardized panel of 6 immunostains results in a similar diagnostic accuracy (85.3%) to a 46-stain panel (86.1%). By maximizing diagnostic yield from scarce H&E tissue and minimizing costly immunostains, LymphoML integrates interpretable machine learning into computational pathology.