Reward-oriented Causal Representation Learning

Zirui Yan; Emre Acartürk; Ali Tajer

Reward-oriented Causal Representation Learning

Zirui Yan, Emre Acartürk, Ali Tajer

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: causal representation learning, causal bandit, reward-oriented

TL;DR: Causal representation learning for downstream tasks (formalized as reward), with algorithms and regret bounds.

Abstract: Causal representation learning (CRL) is the process of disentangling the *latent* low-dimensional causally-related generating factors underlying high-dimensional observable data. Extensive recent studies have characterized CRL identifiability and *perfect* recovery of the latent variables and their attendant causal graph. This paper introduces the notion of *reward-oriented* CRL, the purpose of which is to move away from perfectly learning the latent representation and instead learning it to the extent needed for optimizing a desired downstream task (reward). In reward-oriented CRL, perfectly learning the latent representation can be excessive; instead, it must be learned at the *coarsest* level sufficient for optimizing the desired task. Reward-oriented CRL is formalized as the optimization of a desired function of the observable data over the space of all possible interventions and focuses on linear causal and transformation models. To sequentially identify the optimal subset of interventions, an adaptive exploration algorithm is designed that learns the latent causal graph and the variables needed to identify the best intervention. It is shown that for an $n$-dimensional latent space and a $d$-dimensional observation space, over a horizon $T$ the algorithm's regret scales as $\tilde O(d^{\frac{1}{3}}n^{\frac{1}{3}}u^{\frac{2}{3}}T^{\frac{2}{3}} + u\sqrt{T})$, where $u$ measures total uncertainty in the graph estimates. Furthermore, an almost-matching lower bound is shown to scale as $\Omega(d^{\frac{1}{3}}n^{\frac{1}{3}}p^{\frac{2}{3}}T^{\frac{2}{3}} + p\sqrt{T})$, in which $u$ is replaced by $p$ that counts the number of causal paths in the graph.

Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)

Submission Number: 25255

Loading