Unleashing the Potential of Acquisition Functions in High-Dimensional Bayesian Optimization
Abstract: Bayesian optimization (BO) is widely used to optimize expensive-to-evaluate black-box functions. It first builds a surrogate for the objective and quantifies its uncertainty. It then decides where to sample by maximizing an acquisition function (AF) defined by the surrogate model. However, when dealing with high-dimensional problems, finding the global maximum of the AF becomes increasingly challenging. In such cases, the manner in which the AF maximizer is initialized plays a pivotal role. An inappropriate initialization can severely limit the potential of AF. This paper investigates a largely understudied problem concerning the impact of AF maximizer initialization on exploiting AFs' capability. Our large-scale empirical study shows that the widely used random initialization strategy may fail to harness the potential of an AF. Based on this observation, we propose a better initialization approach by employing multiple heuristic optimizers to leverage the historical data of black-box optimization to generate initial points for an AF maximizer. We evaluate our approach with a variety of heavily studied synthetic test functions and real-world applications. Experimental results show that our techniques, while simple, can significantly enhance the standard BO and outperform state-of-the-art methods by a large margin in most test cases.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=50OqBR9SPE¬eId=wBIGPDHRgh
Changes Since Last Submission: We explain our modifications as follows: 1. We provide a proper review of the initialization schemes used in available BO packages in the Introduction Section to show that the research gap that the paper aims to close truly exists. We show that popular BO packages including BoTorch, skopt, GPyOpt and GPflowOpt use random initialization (selecting initial points from a set of random points) as the default setting. We also compare our method AIBO to BO implementations with alternative AF maximizer initialization strategies in Section 6.5. 2. We explain that the only difference between AIBO and BO-grad is the initialisation in the acquisition function optimization and not other settings in Section 6.1.1. 3. We modify the Motivation Section to show that, when using random initialization, even increasing the number of AF maximizer restarts from 10 to 1000, the quality of intermediate candidates generated during the AF maximization process remains poor. Furthermore, we show even in the case of 1,000 restarts, the performance of the native BO-grad (AF-based selection) is still close to optimal selection and better than random selection among intermediate candidates, suggesting that the AF is effective at selecting a good sample from all candidates but is restricted by the pool of available candidates. 4. We compare the algorithmic runtime of AIBO and BO-grad to show that AIBO could use less algorithmic runtime to achieve better performance than standard BO implementation. This paper would evaluate different methods in terms of function evaluations. This is because in case of expensive functions where function evaluation dominates, AIBO’s algorithmic runtime can be ignored. 5. We modify the description of the proposed AIBO in the introduction section to make it clearer.
Assigned Action Editor: ~Xi_Lin2
Submission Number: 1663