Abstract: Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as quickly as possible. For this, we introduce a fixed-confidence piecewise constant bandit problem. Here, we sequentially query points in the domain and receive noisy evaluations of the function under bandit feedback. We provide instance-dependent lower bounds for the complexity of change point identification in this problem. These lower bounds illustrate that an optimal method should focus its sampling efforts adjacent to each of the change points, and the number of samples around each change point should be inversely proportional to the magnitude of the change.
Building on this, we devise a simple and computationally efficient variant of Track-and-Stop and prove that it is asymptotically optimal in many regimes. We support our theoretical findings with experimental results in synthetic environments demonstrating the efficiency of our method.
Lay Summary: In many settings such as chemistry or manufacturing, tiny changes in the experimental input (e.g. temperature) can lead to large changes in the output (e.g.
yield). Often it is required to locate the changes in input that lead to these
big jumps in output. Moreover, it is often expensive to run these experiments
at every possible input value. We therefore aim to find methods to locate the
jumps as efficiently as possible.
To tackle this problem, we construct methods to sequentially choose an input (e.g. temperature) at which we will observe a noisy output. We can use
previous observations to carefully select these points. We repeat the process
until we can confidently stop and return the location of the set of jumps or
change points. In this paper, we mathematically upper bound the expected
time for our proposed methods to stop. We then show that these expected
stopping times are theoretically optimal when we want to have high confidence that
our returned change point locations are correct. We complement these results
with some experiments showing our methods are effective in practice.
Primary Area: Theory->Online Learning and Bandits
Keywords: bandits, pure exploration, fixed confidence, change points
Submission Number: 14275
Loading