Track: long paper (up to 6 pages)
Keywords: Generative models for NAs, gene expression, cancer models, cancer cell lines, preclinical cancer models, tumor samples, stromal cell contamination, in-vitro adaptation, biomarkers, conditional variational auto-encoders, CVAE, gene expression profiles, pan-cancer, alignment, batch effects, clinical precision medicine, machine learning, bioinformatics, genomic data, translational cancer research, regularization techniques, cancer research workflows, transcriptomic harmonization, tumor transcriptomes, model comparison
TL;DR: We perform best-in-class alignment of cell line and tumor data using a novel CVAE framework.
Abstract: Preclinical cancer models such as cancer cell lines (CL) are central to cancer research but can poorly represent tumor samples due to fundamental differences like stromal cell contamination or in-vitro adaptation. This hinders the translation of new biomarkers or therapeutics into the clinical setting, leading to false leads, failed clinical trials, and the need for expensive multiomics pipelines to reconcile data sets. In this work, we build on conditional variational auto-encoders (CVAE) to enable the direct comparison or selection of the most representative CL for cancer research. We introduce RNAlign (pronounced RNA-align), a CVAE framework with novel regularization techniques, to enable pan-cancer alignment of tumor and CL gene expression profiles. The resulting learned transformation achieves state-of-the-art removal of the most significant differences between the model types, while preserving biologically important subtype information. This framework is extendable to other tumor models such as organoids and can be directly integrated into existing workflows to guide clinical precision medicine.
Submission Number: 12
Loading