EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

Mujin Cheon; Jay H Lee; Dong-Yeun Koh; Calvin Tsay

EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

Mujin Cheon, Jay H Lee, Dong-Yeun Koh, Calvin Tsay

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose an Attention-Deepsets encoder with actor-critic RL and a GP virtual-environment training procedure to provide an end-to-end solution for multi-step lookahead BO.

Abstract: To avoid myopic behavior, multi-step lookahead Bayesian optimization (BO) algorithms consider the sequential nature of BO and have demonstrated promising results in recent years. However, owing to the curse of dimensionality, most of these methods make significant approximations or suffer scalability issues. This paper presents a novel reinforcement learning (RL)-based framework for multi-step lookahead BO in high-dimensional black-box optimization problems. The proposed method enhances the scalability and decision-making quality of multi-step lookahead BO by efficiently solving the sequential dynamic program of the BO process in a near-optimal manner using RL. We first introduce an Attention-DeepSets encoder to represent the state of knowledge to the RL agent and subsequently propose a multi-task, fine-tuning procedure based on end-to-end (encoder-RL) on-policy learning. We evaluate the proposed method, EARL-BO (Encoder Augmented RL for BO), on synthetic benchmark functions and hyperparameter tuning problems, finding significantly improved performance compared to existing multi-step lookahead and high-dimensional BO methods.

Lay Summary: Optimizing complex systems, such as tuning training settings for a machine learning model, often involves trial and error, which can be expensive and time-consuming. A smart approach called Bayesian optimization (BO) helps choose the best experiments to run, but most methods only think one step ahead. Looking multiple steps ahead can lead to better decisions, but this quickly becomes computationally overwhelming, especially with many variables involved. We introduce a method called EARL-BO, which uses reinforcement learning (RL) to make smarter, multi-step decisions efficiently, even in high-dimensional problems. Key contributions include a framework for encoding the state of information and for training the RL algorithm. Our experiments show that EARL-BO outperforms existing methods in both synthetic tasks and real-world scenarios like tuning machine learning models.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Primary Area: Optimization->Zero-order and Black-box Optimization

Keywords: Bayesian Optimization, Reinforcement Learning, High-dimensional Optimization, Nonmyopic Bayesian Optimization

Submission Number: 9113

Loading