Abstract: Causal machine learning has the potential to revolutionize decision-making by combining the predictive power of machine learning algorithms with the theory of causal inference. However, these methods remain underutilized by the broader machine learning community, in part because current empirical evaluations do not permit assessment of their reliability and robustness, undermining their practical utility. Specifically, one of the principal criticisms made by the community is the extensive use of synthetic experiments. We argue, on the contrary, that **synthetic experiments are essential and necessary to precisely assess and understand the capabilities of causal machine learning methods**. To substantiate our position, we critically review the current evaluation practices, spotlight their shortcomings, and propose a set of principles for conducting rigorous empirical analyses with synthetic data. Adopting the proposed principles will enable comprehensive evaluations that build trust in causal machine learning methods, driving their broader adoption and impactful real-world use.
Lay Summary: Causal Machine Learning algorithms have the potential to revolutionize decision-making. However, they remain underutilized because the existing evaluation methodologies do not adequately assess their reliability and robustness.
In this work, we argue that synthetic experiments are essential and necessary to precisely assess and understand the capabilities of causal machine learning methods. To explain our position, we critically review the current evaluation practices, highlight their weaknesses through concrete experiments, and propose a set of principles for conducting rigorous evaluations with synthetic data.
Adopting our principles will enable rigorous evaluation, build trust, and drive broader real-world adoption of Causal ML methods.
Link To Code: https://github.com/panispani/causalml-needs-synth-eval
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: Causal Machine Learning, Synthetic Experiments, Empirical Evaluation
Submission Number: 282
Loading