Learning to Schedule Heuristics for the Simultaneous Stochastic Optimization of Mining ComplexesDownload PDF

24 Nov 2021, 18:02ML4OR-22 PosterReaders: Everyone
Keywords: Simultaneous stochastic optimization of mining complexes, Self-learning hyper-heuristic, Simulated annealing, Reinforcement learning, Low-level heuristics
TL;DR: A self-learning hyper-heuristic, learn-to-perturb (L2P), is proposed combining multi-neighborhood simulated annealing and reinforcement learning to solve the simultaneous stochastic optimization of mining complexes.
Abstract: The simultaneous stochastic optimization of mining complexes (SSOMC) is a large-scale combinatorial optimization problem that manages the extraction of materials from multiple mines and their processing using interconnected facilities. Following the work of Zarpellon et al. (2020) and Chmiela et al. (2021), to the best of our knowledge, this work proposes the first data-driven framework for heuristic scheduling in a hyper-heuristic-based solver that is fully self-managed to solve the SSOMC. The proposed learn-to-perturb (L2P) hyper-heuristic is a multi-neighborhood simulated annealing algorithm. The L2P selects the heuristic (perturbation) to apply in a self-adaptive manner using reinforcement learning (RL) to efficiently explore which local search is best suited for a particular search point. Several state-of-the-art agents have been incorporated into the proposed hyper-heuristic to better adapt the search and guide it towards better solutions. By learning from data describing the performance of heuristics, a problem-specific ordering of heuristics that collectively finds better solutions faster is obtained. The L2P is tested on several real-world mining complexes, with an emphasis on efficiency, robustness, and generalization capacity. Results show a reduction in the computational time by 30-45%.
1 Reply