Cross-Entropy Estimators for Sequential Experiment Design with Reinforcement Learning

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX
Keywords: Sequential design of experiments, reinforcement learning
TL;DR: A new method for sequential design of experiments based on reinforcement learning and a cross-entropy estimator
Abstract: Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current methods rely on contrastive estimators of expected information gain, which require an exponential number of contrastive samples to achieve an unbiased estimation. We propose the use of an alternative lower bound estimator, based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method requires no contrastive samples, can achieve more accurate estimates of high information gains, allows learning of superior design policies, and is compatible with implicit probabilistic models. We assess our algorithm's performance in various tasks, including continuous and discrete designs and explicit and implicit likelihoods.
Submission Number: 17