Optimal Treatment Assignment from Observational Data: A Decision-focused Learning Approach via Pseudo Labels

Published: 02 Mar 2026, Last Modified: 16 Mar 2026ICLR 2026 Workshop AIMSEveryoneRevisionsCC BY 4.0
Keywords: causal decision-making, decision-focused learning, treatment assignment problem
TL;DR: We enhance decision quality for causal decision-making under observational data by integrating a pseudo-label-based decision-focused learning approach.
Abstract: Causal decision-making (CDM) stands as a critical issue in the field of causal inference, as it directly measures the final utility generated by causal effect estimation. In existing literature, the CDM problem typically adopts the predict-then-optimize framework to integrate modules from Machine Learning (ML) and Operations Research. The first step leverages a causal ML model to predict the treatment effect; the second step solves the decision-making problem based on the predictions from the first step. However, due to the propagation of prediction errors from the ML model, the quality of the final decision often remains suboptimal. Decision-Focused Learning (DFL) is an end-to-end modeling paradigm that directly incorporates the ultimate decision loss into the loss function during the prediction model training phase, enabling the ML model to directly maximize the quality of the ultimate decision. Nevertheless, the generalized application of DFL to CDM problems is non-trivial. A core challenge arises from the counterfactual problem: the ML model cannot obtain the ground truth of the treatment effect for each individual. This renders the calculation of decision loss infeasible, thereby impeding the training process of the DFL. In this study, we first define a generalized formulation of the causal treatment assignment problem and theoretically demonstrate the potential advantages of DFL in this context. Furthermore, we propose a Decision-Focused Learning via Pseudo Labels (DFL-PL) approach, which improves the learning process of traditional two-step meta-learner approaches. By enhancing the training pipeline with pseudo-outcomes, our approach enables the calculation of decision loss and the backpropagation of this loss for model training. Finally, we validate the effectiveness of the proposed algorithm on both synthetic datasets and real-world treatment assignment data from Didi Chuxing.
Track: Long Paper
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 63
Loading