Reinforcement Learning for Predict+Optimize

Xinyi HU; Yuansen Cheng

Reinforcement Learning for Predict+Optimize

Xinyi HU, Yuansen Cheng

14 Dec 2020 (modified: 05 May 2023)CUHK 2021 Course IERG5350 Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, combinatorial optimisation, task-based learning

TL;DR: This paper presents a framework to tackle Predict+Optimize problems using neural networks and reinforcement learning.

Abstract: Predict+Optimize (P+O) is a machine learning framework for optimization problems with unknown parameters. This paper presents a framework to tackle P+O problems using neural networks and reinforcement learning. We focus on the traveling salesman problem and train a recurrent neural network that, given a directed graph, predicts a distribution over different edges permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method.

3 Replies

Loading