Meta-learning of Black-box Solvers Using Deep Reinforcement Learning

Sofian Chaybouti; Ludovic Dos Santos; Cedric Malherbe; Aladin Virmaux

Meta-learning of Black-box Solvers Using Deep Reinforcement Learning

Sofian Chaybouti, Ludovic Dos Santos, Cedric Malherbe, Aladin Virmaux

Published: 21 Oct 2022, Last Modified: 05 May 2023NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone

Keywords: Meta-learning, Reinforcement Learning, Black-box, optimization

TL;DR: We use Deep Reinforcement Learning and Transformers to meta-learn black-box solvers

Abstract: Black-box optimization does not require any specification on the function we are looking to optimize. As such, it represents one of the most general problems in optimization, and is central in many scientific areas. However in many practical cases, one must solve a sequence of black-box problems from functions originating from a specific class and hence sharing similar patterns. Classical algorithms such as evolutionary or random methods would treat each problem independently and would be oblivious of the general underlying structure. In this paper, we introduce MELBA, an algorithm that exploits the similarities among a given class of functions to learn a task-specific solver that is tailored to efficiently optimize every function from this task. More precisely, given a class of functions, the proposed algorithm learns a Transformer-based Reinforcement Learning (RL) black-box solver. First, the Transformer embeds a previously gathered set of evaluation points and their image through the function into a latent state that characterizes the current stage of the optimization process. Then, the next evaluation point is sampled according to the latent state. The black-box solver is trained using PPO and the global regret on a training set. We show experimentally the effectiveness of our solvers on various synthetic and real-life tasks including the hyperparameter optimization of ML models (SVM, XGBoost) and demonstrate that our approach is competitive with existing methods.

0 Replies

Loading