ECLIPSE: A Composable Pipeline for Predicting ecDNA Formation, Evolution, and Therapeutic Vulnerabilities in Cancer

Published: 03 Mar 2026, Last Modified: 26 Apr 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: foundation models for genomics, physics-informed neural networks, science-first design, real-world clinical impact, cancer biology, causal discovery, neural stochastic differential equations, scientific benchmarking, methodological rigor, computational oncology, therapeutic target discovery, zero-shot transfer, domain-constrained learning, biomedical foundation models, scientific ML pipelines
TL;DR: Science-first ecDNA framework exposes flawed benchmarks, encodes binomial segregation physics into neural SDEs (r>0.997), and applies causal inference for 80× enriched therapeutic targets—proving domain rigor outperforms architectural complexity.
Abstract: Extrachromosomal DNA (ecDNA) represents one of the most pressing challenges in cancer biology: circular DNA structures that amplify oncogenes, evade targeted therapies, and drive tumor evolution in ∼30% of aggressive cancers. Despite its clinical importance, computational ecDNA research has been built on broken foundations. We discover that existing benchmarks suffer from circular reasoning—models trained on features that already require knowing ecDNA status—artificially inflating performance from AUROC 0.724 to 0.967. We introduce ECLIPSE, the first methodologically sound framework for ecDNA analysis, comprising three modules that transform how we predict, model, and target these structures. ECDNA-FORMER achieves AUROC 0.812 using only standard genomic features, demonstrating for the first time that ecDNA status is predictable without specialized sequencing, and that careful feature curation matters more than complex architectures. CIRCULARODE captures ecDNA's unique stochastic dynamics through physics-constrained neural SDEs, achieving r > 0.997 on experimental data via zero-shot transfer. VULNCAUSAL applies causal inference to identify therapeutic vulnerabilities, achieving 80× enrichment over chance (p < 10−5) and 3.7× higher validation than standard approaches by filtering spurious correlations. Together, these modules establish rigorous baselines for an emerging application area and reveal a broader lesson: in high-stakes biomedical ML, methodological rigor—eliminating leakage, encoding domain physics, addressing confounding—outweighs architectural innovation. ECLIPSE provides both the tools and the template for principled computational oncology.
Submission Number: 19
Loading