Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: tiny / short paper (2-4 pages excluding references; extended abstract format)
Keywords: benchmark, foundation models, transcriptomics, immunology, inflammation, drug development, biological representations, rna, gene expression, evaluation
TL;DR: A Transcriptomic Benchmark for Foundation Models in Immunology and Inflammation Drug Development
Abstract: Foundation models for transcriptomics are increasingly evaluated on technical metrics disconnected from drug development. We introduce an immunology and inflammation (I&I) benchmark of 35 tasks across 8 diseases, organized along the drug development pipeline: target discovery, preclinical translation, and clinical applications. Tasks span treatment response, clinical severity, molecular perturbations, and patient endotypes, with cross-species, cross-disease, and cross-platform transfer to test translational generalization. Patient sample sizes range from 9 to 713, reflecting data-limited regimes typical of early clinical research. We evaluate general-purpose and domain-specific foundation models against statistical baselines. Foundation models achieve the largest gains on translational tasks (perturbation prediction and cross-species transfer) where baselines fail. Treatment outcome prediction and patient stratification also favor foundation models, while clinical severity prediction remains competitive with feature-selected regression. A domain-specific model (EVA) pretrained on I&I data outperforms general-purpose models across most task categories. Benchmark performance improves with pretraining steps without saturating, suggesting it can serve as a diagnostic for model development.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 26
Loading