CROME: Covariate-Balanced Causal Representation Learning for Composite Outcomes via Multi-Task Estimation in Electronic Health Records

Kan Chen; Inyoung Choi; Ravi B Parikh; Qi Long

CROME: Covariate-Balanced Causal Representation Learning for Composite Outcomes via Multi-Task Estimation in Electronic Health Records

Kan Chen, Inyoung Choi, Ravi B Parikh, Qi Long

11 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causal inference; Causal representation learning.

TL;DR: We present CROME, a causal representation learning framework using multi-task estimation to model treatment effects across multiple outcomes, improving generalization and causal validity in EHR studies.

Abstract: Estimating treatment effects on composite outcomes is challenging, particularly in high-stake decision making domains such as healthcare where multiple related outcomes jointly inform clinical decisions. Existing approaches often simplify this problem by collapsing multiple component outcomes into a single target, overlooking the underlying structure, introducing modeling bias, and limiting interpretability. In this work, we propose \textbf{CROME} (\textbf{C}ausal \textbf{R}epresentation for Composite \textbf{O}utcome via \textbf{M}ulti-task \textbf{E}stimation), a framework that leverages representation learning, multi-task learning (MTL), and covariate-balancing techniques to predict component-level potential outcomes, which are then aggregated through a user-specified utility function. CROME jointly learns a shared representation across tasks along with outcome-specific prediction heads, enabling accurate and interpretable estimation of treatment effects on composite outcomes. Our theoretical results show that CROME achieves lower generalization error, under mild conditions, than MTL without shared representation and single-task baselines. Empirical results on synthetic and semi-synthetic datasets inspired by the Infant Health and Development Program (IHDP) and an electronic health records (EHR) dataset in oncology confirm the advantages of our approach over existing methods including enhanced accuracy and interpretability. Our framework provides a principled and flexible solution for causal inference in complex, multi-outcome clinical settings, with broad applicability across patient-reported and EHR-derived data.

Supplementary Material: zip

Primary Area: causal reasoning

Submission Number: 4155

Loading