MAESTRO: Masked Encoding Set Transformer with Self-Distillation

Matthew Eric Lee; Jaesik Kim; Matei Ionita; Jonghyun Lee; Michelle L. McKeague; YONGHYUN NAM; Irene Khavin; Yidi Huang; Victoria Fang; Sokratis Apostolidis; Divij Mathew; Shwetank; Ajinkya Pattekar; Zahabia Rangwala; Amit Bar-Or; Benjamin A Fensterheim; Benjamin A. Abramoff; Rennie L. Rhee; Damian Maseda; Allison R Greenplate; John Wherry; Dokyoon Kim

MAESTRO: Masked Encoding Set Transformer with Self-Distillation

Published: 22 Jan 2025, Last Modified: 08 Apr 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-supervision, representation learning, immunology, biology, single-cell, cytometry, set, set representations

TL;DR: We developed a self-supervised set representation learning model for vector-sized representations of single-cell data

Abstract: The interrogation of cellular states and interactions in immunology research is an ever-evolving task, requiring adaptation to the current levels of high dimensionality. Cytometry enables high-dimensional profiling of immune cells, but its analysis is hindered by the complexity and variability of the data. We present MAESTRO, a self-supervised set representation learning model that generates vector representations of set-structured data, which we apply to learn immune profiles from cytometry data. Unlike previous studies only learn cell-level representations, whereas MAESTRO uses all of a sample's cells to learn a set representation. MAESTRO leverages specialized attention mechanisms to handle sets of variable number of cells and ensure permutation invariance, coupled with an online tokenizer by self-distillation framework. We benchmarked our model against existing cytometry approaches and other existing machine learning methods that have never been applied in cytometry. Our model outperforms existing approaches in retrieving cell-type proportions and capturing clinically relevant features for downstream tasks such as disease diagnosis and immune cell profiling.

Supplementary Material: pdf

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11153

Loading