Enhancing Sample Efficiency in Black-box Combinatorial Optimization via Symmetric Replay Training

Hyeonah Kim; Minsu Kim; Sungsoo Ahn; Jinkyoo Park

Enhancing Sample Efficiency in Black-box Combinatorial Optimization via Symmetric Replay Training

Hyeonah Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Black-box combinatorial optimization, sample efficiency, symmetries, drug discovery, hardware design, deep reinforcement learning, imitation learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper proposes a general approach to improve the sample efficiency of DRL for black-box combinatorial optimization by exploiting symmetric transformations.

Abstract: Black-box combinatorial optimization (black-box CO) is frequently encountered in various industrial fields, such as drug discovery or hardware design. Despite its widespread relevance, solving black-box CO problems is highly challenging due to the vast combinatorial solution space and resource-intensive nature of black-box function evaluations. These inherent complexities induce significant constraints on the efficacy of existing deep reinforcement learning (DRL) methods when applied to practical problem settings. For efficient exploration with the limited availability of function evaluations, this paper introduces a new generic method to enhance sample efficiency. We propose symmetric replay training that leverages the high-reward samples and their under-explored regions in the symmetric space. In replay training, the policy is trained to imitate the symmetric trajectories of these high-rewarded samples. The proposed method is beneficial for the exploration of highly rewarded regions without the necessity for additional online interactions - free. The experimental results show that our method consistently improves the sample efficiency of various DRL methods on real-world tasks, including molecular optimization and hardware design. Our source code is available at https://anonymous.4open.science/r/sym_replay.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1631

Loading