Keywords: Multi-Omics, Foundation Models, Computational Biology
TL;DR: Autoencoder for multi-omics reconstruction and representation learning.
Abstract: The high cost of functional molecular assays, and prevalence of missing modalities and unmatched samples in computational biology, create significant barriers to comprehensive multi-omic profiling, essential for capturing and reasoning over molecules, cells, tissues, and organisms.
This work proposes a model that learns meaningful representations from multi-omics cancer data supporting the reconstruction of missing and unpaired modalities. Contrary to increasingly complex, larger models, e.g.~Foundation Models (FMs), ARO prioritizes practical applicability in limited or incomplete data settings.
ARO optimally reconstructs missing modalities (MSE of 0.166 on validation and of 0.168 on test data), with its learned latent embeddings enabling a downstream cancer classification task. Our findings indicate that analyzing diverse molecular layers as a single integrated system offers a reliable and cost-efficient approach, reducing dependence on large-scale experimental testing, while still supporting multi-omic exploration in limited data settings.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 65
Loading