A Deep Generative Mixture Model for Enhancing Circulating Tumor DNA Estimation

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Learning, ctDNA Prediction, Cancer, Generative Models, Whole Genome Sequencing
TL;DR: We propose a deep generative mixture model that aggregates noisy WGS read-level signals for ctDNA estimation, showing proof-of-concept recovery in synthetic data and preliminary latent structure in real plasma samples.
Abstract: Circulating tumor DNA (ctDNA) provides a minimally invasive measure of tumor burden, but estimating ctDNA fraction from plasma is difficult when tumor-derived molecules are rare relative to the cell-free DNA background. We introduce a deep generative mixture model that estimates sample-level ctDNA fraction from read-base events in whole-genome sequencing (WGS). Each event is modeled as a noisy observation from healthy-derived or tumor-derived components, mixed by the sample-specific ctDNA fraction. To capture batch effects and patient-specific background noise, sample embeddings are learned jointly with a decoder that uses read-level covariates, sequence context, and tumor-catalog information to model allele probabilities. Self-normalized inverse-probability weighting handles stratified genome-wide sampling, and one trained model supports tumor-informed and tumor-agnostic inference. We apply the framework to serial plasma WGS from a stage III colorectal cancer cohort.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 128
Loading