Semi-supervised batch learning from logged data

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Semi-supervised batch learning, off-policy learning, IPS estimator, learning bounds
TL;DR: We propose a theoretical-inspired semi-supervised batch learning from logged data with known-reward and missing-reward samples.
Abstract: Offline policy learning methods are intended to learn a policy from logged data, which includes context, action, and reward for each sample point. In this work we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where rewards of some samples are missing, so there are samples with rewards and samples missing rewards in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive new upper bound for the true risk under inverse propensity score estimation to better address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is reward-independent and, as a result, can be evaluated using the logged missing-reward data. Consequently, even though reward feedback is only present for some samples, a parameterized policy can be learned by leveraging the missing-reward samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5165
Loading