Meta-Black-Box-Optimization through Offline Q-function Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: A novel offline learning-based MetaBBO framework tailored for Dynamic Algorithm Configuration in evolutionary algorithms.
Abstract: Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that using RL to learn a meta-level policy for dynamic algorithm configuration (DAC) over an optimization task distribution could significantly enhance the performance of the low-level BBO algorithm. However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. Specifically, we first transform DAC task into long-sequence decision process. This allows us further introduce an effective Q-function decomposition mechanism to reduce the learning difficulty within the intricate algorithm configuration space. Under this setting, we propose three novel designs to meta-learn DAC policy from offline data: we first propose a novel collection strategy for constructing offline DAC experiences dataset with balanced exploration and exploitation. We then establish a decomposition-based Q-loss that incorporates conservative Q-learning to promote stable offline learning from the offline dataset. To further improve the offline learning efficiency, we equip our work with a Mamba architecture which helps long-sequence learning effectiveness and efficiency by selective state model and hardware-aware parallel scan respectively. Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba \href{https://github.com/MetaEvo/Q-Mamba}{online}.
Lay Summary: We developed a novel way to teach computer how to automatically set hyper-parameters for optimization algorithms. Since existing methods let the computer interact with the optimization algorithms in an online paradigm, their learning efficiency is relatively low. Hence this paper discusses how to make computer capable of learning from historical experience, that is, offline learning with higher efficiency.
Link To Code: https://github.com/MetaEvo/Q-Mamba
Primary Area: Optimization->Zero-order and Black-box Optimization
Keywords: Black-Box Optimization, Dynamic Algorithm Configuration, Learning to Optimize, Offline Reinforcement Learning
Submission Number: 4473
Loading