- TL;DR: Learning pivoting rules of the simplex algorithm for solving linear programs to improve the solution times, demonstrated on linear approximations of travelling salesman problem.
- Abstract: Linear Programs (LPs) are a fundamental class of optimization problems with a wide variety of applications. Fast algorithms for solving LPs are the workhorse of many combinatorial optimization algorithms, especially those involving integer programming. One popular method to solve LPs is the simplex method which, at each iteration, traverses the surface of the polyhedron of feasible solutions. At each vertex of the polyhedron, one of several heuristics chooses the next neighboring vertex, and these vary in accuracy and computational cost. We use deep value-based reinforcement learning to learn a pivoting strategy that at each iteration chooses between two of the most popular pivot rules -- Dantzig and steepest edge. Because the latter is typically more accurate and computationally costly than the former, we assign a higher wall time-based cost to steepest edge iterations than Dantzig iterations. We optimize this weighted cost on a neural net architecture designed for the simplex algorithm. We obtain between 20% to 50% reduction in the gap between weighted iterations of the individual pivoting rules, and the best possible omniscient policies for LP relaxations of randomly generated instances of five-city Traveling Salesman Problem.
- Keywords: Simplex Algorithm, Pivoting Rules, Reinforcement Learning, Combinatorial Optimization, Supervised Learning, Travelling Salesman Problem