Keywords: Black-Box Optimization, Dynamic Algorithm Configuration, Learning to Optimize, Offline Reinforcement Learning, Mamba
TL;DR: A novel MetaBBO framework which achieves both effectiveness and efficiency for auto-configuring black-box optimizer.
Abstract: Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that meta-training a neural network based meta-level control policy over an optimization task distribution could significantly enhance the optimization performance of the low-level black-box optimizers. However, achieving such performance enhancement requires effective policy optimization/search method to locate optimal control policy within a massive joint-action space. The online learning fashion of existing works further makes the efficiency of MetaBBO problematic. To address these technical challenges, we propose an offline learning framework in this paper, termed Q-Mamba. Concretely, our method uses a Mamba neural network architecture to meta-learn decomposed Q-functions for each configurable component in the low-level optimizer. By decomposing the Q-function of the configuration decisions of all components in an optimizer, we can apply effective sequence modelling to avoid searching the control policy in the massive joint-action space. Furthermore, by leveraging the long-sequence modelling advantage of Mamba and moderate offline trajectory samples, Q-Mamba can be efficiently trained through a synergy of offline Temporal-Difference update and Conservative Q-Learning regularization to achieve competitive performance against the online learning paradigms. Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior optimization performance to prior online/offline learning baselines, while significantly improving the training efficiency of existing online learning baselines. Additional ablation studies show that each of the proposed key designs contributes to this good performance.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2058
Loading