Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate PolicyDownload PDF

27 Sept 2018, 22:36 (edited 22 Feb 2019)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Keywords:
Abstract:
8 Replies

Loading