Scalable Evolution Strategies Pipeline for Solving the Vehicle Routing Problem

Anonymous

Scalable Evolution Strategies Pipeline for Solving the Vehicle Routing Problem

Anonymous

17 Oct 2020 (modified: 05 May 2023)Submitted to LMCA2020Readers: Everyone

Keywords: reinforcement learning, evolution strategies, vehicle routing problem

TL;DR: Improvements in scalability of reinforcement learning models by using evolution strategies.

Abstract: As a general framework for applying deep learning methods to solve a problem, Deep Reinforcement Learning (RL) has many applications. In this paper we study Deep RL as it applies to the Vehicle Routing Problem (VRP). Specifically, we focus on the capacitated variant of the VRP (CVRP), in which vehicles have a maximum carrying capacity and customers have varied demands. Currently in the literature, there are quite a few papers in which researchers have applied Deep RL to the CVRP. While the methods developed are able to produce solutions to problems fairly quickly, so far, they all use GPUs to train the models, which reduces scalability. Recently, OpenAI released a study on comparing Evolution Strategies (ES) with classic Deep RL training methods, such as Policy Gradient (PG), and found that ES uses less resources and performs similarly to state-of-the-art Deep RL training methods. The main benefit of this is that ES can be trained on CPUs in parallel, which costs less than training on a GPU. In light of this, we are motivated to replace traditional RL training methods in the research with ES for comparison.

1 Reply

Loading