Continual Causal Refinement: Learning from Sequential Perturbation Data

Published: 23 May 2026, Last Modified: 23 May 2026CATS@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: continual learning, foundation models, post-training, causal representation learning, perturbation data, regulatory genomics, sequence-to-function models, replay, catastrophic forgetting
Abstract: Biological foundation models are often trained on large observational datasets collected across genomes, cells, tissues, or conditions, yet many downstream uses require accurate prediction under targeted perturbations. Perturbation data provide the relevant supervision, but each experiment is expensive and localized, producing small batches rather than broad resampling of the pretraining distribution. Refining a model from these batches therefore creates a continual learning problem: new perturbation evidence must be consolidated without erasing broad pretrained competence, and what is learned locally should transfer to future perturbation settings. We formalize this regime as continual causal refinement and instantiate it in regulatory genomics with a controlled benchmark in which sequence-to-function models are sequentially updated using in silico perturbation libraries. Benchmarking standard continual learning methods shows that naive fine-tuning causes substantial forgetting, whereas replay preserves observational performance and improves transfer to held-out perturbation libraries. Continual causal refinement provides a new avenue to iteratively refine biological foundation models with perturbation data.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 74
Loading