Keywords: Deep Learning, ctDNA Prediction, Cancer, Generative Models, Whole Genome Sequencing
TL;DR: We propose a deep generative mixture model that aggregates noisy WGS read-level signals for ctDNA estimation, showing proof-of-concept recovery in synthetic data and preliminary latent structure in real plasma samples.
Abstract: Circulating tumor DNA (ctDNA) provides a minimally invasive measure of tumor
burden, but estimating ctDNA fraction from plasma is difficult when
tumor-derived molecules are rare relative to the cell-free DNA background. We
introduce a deep generative mixture model that estimates sample-level ctDNA
fraction from read-base events in whole-genome sequencing (WGS). Each event is modeled
as a noisy observation from healthy-derived or tumor-derived components, mixed by
the sample-specific ctDNA fraction. To capture batch effects and
patient-specific background noise, sample embeddings are learned jointly with a
decoder that uses read-level covariates, sequence context, and tumor-catalog
information to model allele probabilities. Self-normalized
inverse-probability weighting handles stratified genome-wide sampling, and one
trained model supports tumor-informed and tumor-agnostic inference. We apply the
framework to serial plasma WGS from a stage III colorectal
cancer cohort.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 128
Loading