Sanity Checking Causal Representation Learning on a Simple Real-World System

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We provide a sanity test for CRL methods and their underlying theory, based on a carefully designed, real, physical system whose data-generating process matches the core assumptions of CRL, and where these methods are expected to work.
Abstract: We evaluate methods for causal representation learning (CRL) on a simple, real-world system where these methods are expected to work. The system consists of a controlled optical experiment specifically built for this purpose, which satisfies the core assumptions of CRL and where the underlying causal factors---the inputs to the experiment---are known, providing a ground truth. We select methods representative of different approaches to CRL and find that they all fail to recover the underlying causal factors. To understand the failure modes of the evaluated algorithms, we perform an ablation on the data by substituting the real data-generating process with a simpler synthetic equivalent. The results reveal a reproducibility problem, as most methods already fail on this synthetic ablation despite its simple data-generating process. Additionally, we observe that common assumptions on the mixing function are crucial for the performance of some of the methods but do not hold in the real data. Our efforts highlight the contrast between the theoretical promise of the state of the art and the challenges in its application. We hope the benchmark serves as a simple, real-world sanity check to further develop and validate methodology, bridging the gap towards CRL methods that work in practice. We make all code and datasets publicly available at <anonymized>.
Lay Summary: Making machine learning algorithms reason about the world using causal models is an active and open field of research, which promises to overcome many of the great challenges that current algorithms face. Progress in this field is theory-driven and evidence that these theoretical advances translate to the real world are largely missing. We address this gap between theory and real-world application by designing an experimental sanity check for algorithms that should learn causal variables and models from observational data. Our setup closely follows the assumptions required by the methods' theory and is so simple that we would expect the tested methods to easily succeed. However, our results show that none of the methods we look at pass our sanity check, indicating that existing approaches are not readily applicable to realistic data. Our work provides a controlled environment in which researchers can test their existing algorithms, or develop new ones on, with a focus on guiding development with real-world data in mind. We show that there is still a significant gap between theoretical advances an real-world applicability and we hope that our framework can break this gap down into manageable steps.
Link To Code: https://github.com/simonbing/CRLSanityCheck
Primary Area: General Machine Learning->Representation Learning
Keywords: causal representation learning, benchmarks, causality
Submission Number: 7212
Loading