Keywords: Reinforcement Learning; Simulated Annealing; Quasi-Equilibrium Cooling
Abstract: We present \textbf{RL‑QESA}—\underline{R}einforcement‑\underline{L}earning \underline{Q}uasi‑\underline{E}quilibrium \underline{S}imulated \underline{A}nnealing—a new framework that couples classical simulated annealing (SA) with an adaptive, learning‑based cooling schedule. A policy network observes block‑level statistics and lowers the temperature $T_{n+1}$ \emph{only when} the empirical energy moments at $T_n$ coincide with their quasi‑equilibrium predictions, certifying that the sampler has fully explored the current thermal state before cooling further. We show that RL‑QESA inherits SA’s classical convergence guarantees while permitting far richer cooling profiles than hand‑crafted schedules. On the Rosenbrock function and Lennard–Jones cluster benchmarks, RL‑QESA attains up to three‑fold faster convergence and consistently lower terminal energies compared with vanilla SA and recent neural variants. By automating temperature descent in a principled, quasi‑equilibrium fashion and retaining simple proposal mechanics, RL‑QESA offers a robust, learning‑driven optimiser for challenging global optimisation tasks.
Submission Number: 174
Loading