Representation Learning for Distributional Perturbation Extrapolation

Julius von Kügelgen; Xinwei Shen; Jakob Ketterer; Nicolai Meinshausen; Jonas Peters

Representation Learning for Distributional Perturbation Extrapolation

Julius von Kügelgen, Xinwei Shen, Jakob Ketterer, Nicolai Meinshausen, Jonas Peters

Published: 06 Mar 2025, Last Modified: 21 Jul 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0

Track: Full Paper Track

Keywords: perturbation modeling, extrapolation, distributional regression, representation learning, identifiability, omics data

TL;DR: We phrase perturbation modeling as a distributional regression task, derive identifiability and extrapolation guarantees, and propose a practical estimation method that outperforms CPA.

Abstract: We consider the problem of modelling the effects of perturbations such as gene knockdowns or drug combinations on low-level measurements like RNA sequencing data. Specifically, given data collected under some perturbations, we aim to predict the distribution of measurements for new perturbations. To address this challenging extrapolation task, we posit that perturbations act additively in a suitable, unknown embedding space. More precisely, we formulate the generative process underlying the observed data as a latent variable model, in which perturbations amount to mean shifts in latent space. We prove that the representation and perturbation effects are identifiable up to affine transformation and use this to characterize the class of unseen perturbations for which we obtain extrapolation guarantees. To estimate the model from data, we propose the perturbation distribution autoencoder (PDAE) which is trained by maximising the distributional similarity between true and predicted perturbation distributions The trained model can then be used to predict previously unseen perturbation distributions. Preliminary empirical evidence suggests that PDAE compares favourably to CPA (Lotfollahi et al., 2023) and other baselines at predicting the effects of unseen perturbations.

Submission Number: 77

Loading