Abstract: Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.
Lay Summary: Bayesian Optimization (BO) is a technique to optimize functions that appear in engineering problems, hyperparameter optimization for machine learning, and other fields where observing the function requires considerable resources. BO learns a model of the function it aims to optimize and chooses new points to evaluate by trading off how well a new point is expected to perform and how uncertain the model is about the expected performance of that point. For functions that have many input parameters, the required number of observations to learn the model with sufficient accuracy grows so fast that it was widely believed that either only functions with a moderate number of parameters can be learned efficiently with BO or that the function has to satisfy additional assumptions that more sophisticated algorithms can take advantage of. Recent works have questioned this paradigm and show that simple BO setups scale to many more parameters than previously believed. This paper investigates why previous methods did not scale well, why current methods do, and what the limitations of high-dimensional Bayesian optimization are. Based on our insights, we propose a simple yet effective method for high-dimensional BO and show that it is competitive with the state-of-the-art.
Primary Area: Optimization->Zero-order and Black-box Optimization
Keywords: Bayesian optimization, global optimization, Gaussian process, high-dimensional
Submission Number: 11446
Loading