FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking
Abstract: This paper presents FlowCyt, the first
comprehensive benchmark for multi-class
single-cell classification in flow cytometry
data. The dataset comprises bone marrow
samples from 30 patients, with each cell
characterized by twelve markers. Ground
truth labels identify five hematological cell
types: T lymphocytes, B lymphocytes,
Monocytes, Mast cells, and Hematopoietic
Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning
and semi-supervised transductive learning
on up to 1 million cells per patient. Baseline methods include Gaussian Mixture
Models, XGBoost, Random Forests, Deep
Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The
benchmark allows standardized evaluation
of clinically relevant classification tasks,
along with exploratory analyses to gain insights into hematological cell phenotypes.
This represents the first public flow cytometry benchmark with a richly annotated,
heterogeneous dataset. It will empower
the development and rigorous assessment
of novel methodologies for single-cell analysis.
Loading