<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      Recent Advances in Deep Learning for Routing Problems &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/2021/12/01/deep-learning-for-routing-problems/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">Recent Advances in Deep Learning for Routing Problems</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep-learning </a>
  
    <a class="content-tag" href="/tags/#graph-neural-networks"> graph-neural-networks </a>
  
    <a class="content-tag" href="/tags/#combinatorial-optimization"> combinatorial-optimization </a>
  
    <a class="content-tag" href="/tags/#travelling-salesperson-problem"> travelling-salesperson-problem </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p><strong>TL;DR</strong> Developing neural network-driven solvers for combinatorial optimization problems such as the Travelling Salesperson Problem have seen a surge of academic interest recently. This blogpost presents a <strong>Neural Combinatorial Optimization</strong> pipeline that unifies several recently proposed model architectures and learning paradigms into one single framework. Through the lens of the pipeline, we analyze recent advances in deep learning for routing problems, and provide new directions to stimulate future research towards practical impact.</p>

<ul class="table-of-content" id="markdown-toc">
  <li><a href="#background-on-combinatorial-optimization-problems" id="markdown-toc-background-on-combinatorial-optimization-problems">Background on Combinatorial Optimization Problems</a>    <ul>
      <li><a href="#tsp-and-routing-problems" id="markdown-toc-tsp-and-routing-problems">TSP and Routing Problems</a></li>
      <li><a href="#deep-learning-to-solve-routing-problems" id="markdown-toc-deep-learning-to-solve-routing-problems">Deep Learning to solve Routing Problems</a></li>
      <li><a href="#neural-combinatorial-optimization" id="markdown-toc-neural-combinatorial-optimization">Neural Combinatorial Optimization</a></li>
    </ul>
  </li>
  <li><a href="#unified-neural-combinatorial-optimization-pipeline" id="markdown-toc-unified-neural-combinatorial-optimization-pipeline">Unified Neural Combinatorial Optimization Pipeline</a>    <ul>
      <li><a href="#1-defining-the-problem-via-graphs" id="markdown-toc-1-defining-the-problem-via-graphs">(1) Defining the problem via graphs</a></li>
      <li><a href="#2-obtaining-latent-embeddings-for-graph-nodes-and-edges" id="markdown-toc-2-obtaining-latent-embeddings-for-graph-nodes-and-edges">(2) Obtaining latent embeddings for graph nodes and edges</a></li>
      <li><a href="#3--4-converting-embeddings-into-discrete-solutions" id="markdown-toc-3--4-converting-embeddings-into-discrete-solutions">(3 + 4) Converting embeddings into discrete solutions</a></li>
      <li><a href="#5-training-the-model" id="markdown-toc-5-training-the-model">(5) Training the model</a></li>
    </ul>
  </li>
  <li><a href="#characterizing-prominent-papers-via-the-pipeline" id="markdown-toc-characterizing-prominent-papers-via-the-pipeline">Characterizing Prominent Papers via the Pipeline</a></li>
  <li><a href="#recent-advances-and-avenues-for-future-work" id="markdown-toc-recent-advances-and-avenues-for-future-work">Recent Advances and Avenues for Future Work</a>    <ul>
      <li><a href="#leveraging-equivariance-and-symmetries" id="markdown-toc-leveraging-equivariance-and-symmetries">Leveraging Equivariance and Symmetries</a></li>
      <li><a href="#improved-graph-search-algorithms" id="markdown-toc-improved-graph-search-algorithms">Improved Graph Search Algorithms</a></li>
      <li><a href="#learning-to-improve-sub-optimal-solutions" id="markdown-toc-learning-to-improve-sub-optimal-solutions">Learning to Improve Sub-optimal Solutions</a></li>
      <li><a href="#learning-paradigms-that-promote-generalization" id="markdown-toc-learning-paradigms-that-promote-generalization">Learning Paradigms that Promote Generalization</a></li>
      <li><a href="#improved-evaluation-protocols" id="markdown-toc-improved-evaluation-protocols">Improved Evaluation Protocols</a></li>
    </ul>
  </li>
  <li><a href="#summary" id="markdown-toc-summary">Summary</a></li>
</ul>

<hr />

<h2 id="background-on-combinatorial-optimization-problems">Background on Combinatorial Optimization Problems</h2>

<p><strong>Combinatorial Optimization</strong> is a practical field in the intersection of mathematics and computer science that aims to solve constrained optimization problems which are NP-Hard. <strong>NP-Hard problems</strong> are challenging as exhaustively searching for their solutions is beyond the limits of modern computers. It is impossible to solve NP-Hard problems optimally at large scales.</p>

<p><strong>Why should we care?</strong> Because robust and reliable approximation algorithms to popular problems have immense practical applications and are the backbone of modern industries. For example, the <strong>Travelling Salesperson Problem</strong> (TSP) is the most popular Combinatorial Optimization Problems (COPs) and comes up in applications as diverse as logistics and scheduling to genomics and systems biology.</p>

<blockquote>
  <p>TSP is so famous, or notorious, that it even has an <a href="https://xkcd.com/399/">xkcd comic</a> dedicated to it!</p>
</blockquote>

<h3 id="tsp-and-routing-problems">TSP and Routing Problems</h3>

<p>TSP is also a classic example of a <strong>Routing Problem</strong> – Routing Problems are a class of COPs that require a sequence of nodes (e.g. cities) or edges (e.g. roads between cities) to be traversed in a specific order while fulfilling a set of constraints or optimising a set of variables. TSP requires a set of edges to be traversed in an order that ensures all nodes are visited exactly once. In the algorithmic sense, the optimal “tour” for our salesperson is a sequence of selected edges that provides the minimal distance or time taken over a Hamiltonian cycle, see Figure 1 for an illustration.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/tsp-gif.gif" width="60%" />
  <figcaption><b>Figure 1:</b> TSP asks the following question: Given a list of cities and the distances between each pair of cities, what is the <b>shortest possible route</b> that a salesperson can take to <b>visit each city</b> and <b>returns to the origin city</b>? 
  <!-- Formally, TSP is defined as a constrained optimization problem on a graph: one needs to search the space of permutations to find an optimal sequence of nodes, called a tour, with minimal total edge weights or tour length.  -->
  (Source: <a href="http://mathgifs.blogspot.com/2014/03/the-traveling-salesman.html">MathGifs</a>)</figcaption>
</center></figure>

<p>In real-world and practical scenarios, Routing Problems, or Vehicle Routing Problems (VRPs), can involve challenging constraints beyond the somewhat <em>vanilla</em> TSP. For e.g., the <strong>TSP with Time Windows</strong> (TSPTW) adds a “time window” contraint to nodes in a TSP graph. This means certain nodes can only be visited during fixed time intervals. Another variant, the <strong>Capacitated Vehicle Routing Problem</strong> (CVRP) aims to find the optimal routes for a fleet of vehicles (i.e. multiple salespersons) visiting a set of customers (i.e. cities), with each vehicle having a maximum carrying capacity.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/vrps.png" width="50%" />
  <figcaption><b>Figure 2:</b> TSP and the associated class of Vehicle Routing Problems. VRPs can be characterized by their constraints, and this figure presents the relatively well-studied ones. There could be VRPs in the wild with <b>more complex</b> and <b>non-standard constraints</b>! (Source: adapted from <a href="https://ieeexplore.ieee.org/abstract/document/6887420">Benslimane and Benadada, 2014</a>)</figcaption>
</center></figure>

<h3 id="deep-learning-to-solve-routing-problems">Deep Learning to solve Routing Problems</h3>

<p>Developing reliable algorithms and solvers for routing problems requires significant <strong>expert intuition</strong> and years of <strong>trial-and-error</strong>. For e.g., the state-of-the-art TSP solver, <strong>Concorde</strong>, leverages over 50 years of research on linear programming, cutting plane algorithms and branch-and-bound; here is an <a href="https://www.youtube.com/watch?v=q8nQTNvCrjE">inspiring video</a> on its history. Concorde can find optimal solutions up to tens of thousands of nodes, but with extremely long execution time. As you can imagine, designing algorithms for complex VRPs is even more challegning and time consuming, especially with real-world constraints such as capacities or time windows in the mix.</p>

<p>This has lead the machine learning community to ask the following question:</p>

<p><strong>Can we use deep learning to automate and augment expert intuition required for solving COPs?</strong></p>

<blockquote>
  <p>See this masterful survey from Mila for more in-depth motivation: [<a href="https://arxiv.org/abs/1811.06128">Bengio et al., 2020</a>].</p>
</blockquote>

<h3 id="neural-combinatorial-optimization">Neural Combinatorial Optimization</h3>

<p>Neural Combinatorial Optimization is an attempt to use <strong>deep learning as a hammer</strong> to hit the <strong>COP nails</strong>. Neural networks are trained to produce approximate solutions to COPs by directly learning from problem instances themselves. This line of research started at Google Brain with the seminal <a href="https://arxiv.org/abs/1506.03134">Seq2seq Pointer Networks</a> and <a href="https://arxiv.org/abs/1611.09940">Neural Combinatorial Optimization with RL</a> papers. Today, <a href="https://arxiv.org/abs/2102.09544">Graph Neural Networks</a> are usually the architecture of choice at the core of deep learning-driven solvers as they tackle the graph structure of these problems.</p>

<p>Neural Combinatorial Optimization aims to improve over traditional COP solvers in the following ways:</p>

<ul>
  <li>
    <p><strong>No handcrafted heuristics.</strong> Instead of application experts manually designing heuristics and rules, neural networks learn them via imitating an optimal solver or via reinforcement learning (we describe a pipeline for this in the next section).</p>
  </li>
  <li>
    <p><strong>Fast inference on GPUs.</strong> Traditional solvers can often have prohibitive execution time for large-scale problems, e.g. Concorde took 7.5 months to solve the largest TSP with 109,399 nodes. On the other hand, once a neural network has been trained to approximately solve a COP, they have significantly favorable time complexity and can be parallelized via GPUs. This makes them highly desirable for real-time decision-making problems, especially routing problems.</p>
  </li>
  <li>
    <p><strong>Tackling novel and under-studied COPs.</strong> The development of problem-specific COP solvers for novel or understudied problems that have esoteric constraints can be significantly sped up via neural combinatorial optimization. Such problems often arise in scientific discovery or computer architecture, e.g. an exciting success story is <a href="https://www.nature.com/articles/s41586-021-03544-w">Google’s chip design system</a> that will power the next generation of TPUs. You read that right – <strong>the next TPU chip for running neural networks has been designed by a neural network!</strong></p>
  </li>
</ul>

<hr />

<h2 id="unified-neural-combinatorial-optimization-pipeline">Unified Neural Combinatorial Optimization Pipeline</h2>

<p>Using TSP as a canonical example, we now present a generic <strong>neural combinatorial optimization pipeline</strong> that can be used to characterize modern deep learning-driven approaches to several routing problems.</p>

<p>State-of-the-art approaches for TSP take the raw coordinates of cities as input and leverage <strong>GNNs</strong> or <strong>Transformers</strong> combined with classical <strong>graph search</strong> algorithms to constructively build approximate solutions. Architectures can be broadly classified as: (1) <strong>autoregressive</strong> approaches, which build solutions in a step-by-step fashion; and (2) <strong>non-autoregressive</strong> models, which produce the solution in one shot. Models can be trained to <strong>imitate optimal solvers</strong> via supervised learning or by minimizing the length of TSP tours via <strong>reinforcement learning</strong>.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/pipeline-box.png" width="75%" />
  <figcaption><b>Figure 3:</b> Neural combinatorial optimization pipeline (Source: <a href="https://arxiv.org/abs/2006.07054">Joshi et al., 2021</a>).</figcaption>
</center></figure>

<p>The 5-stage pipeline from <a href="https://arxiv.org/abs/2006.07054">Joshi et al., 2021</a> brings together prominent model architectures and learning paradigms into <strong>one unified framework</strong>. This will enable us to disect and analyze recent developments in deep learning for routing problems, and provide new directions to stimulate future research.</p>

<h3 id="1-defining-the-problem-via-graphs">(1) Defining the problem via graphs</h3>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/pipeline-1.png" width="60%" />
  <figcaption><b>Figure 4: Problem Definition:</b> TSP is formulated via a fully-connected graph of cities/nodes, which can be sparsified further.</figcaption>
</center></figure>

<p>TSP is formulated via a fully-connected graph where <strong>nodes</strong> correspond to <strong>cities</strong> and <strong>edges</strong> denote <strong>roads</strong> between them. The graph can be sparsified via heuristics such as k-nearest neighbors. This enables models to scale up to large instances where pairwise computation for all nodes is intractable [<a href="https://arxiv.org/abs/1704.01665">Khalil et al., 2017</a>] or learn faster by reducing the search space [<a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a>].</p>

<h3 id="2-obtaining-latent-embeddings-for-graph-nodes-and-edges">(2) Obtaining latent embeddings for graph nodes and edges</h3>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/pipeline-2.png" width="60%" />
  <figcaption><b>Figure 5: Graph Embedding:</b> Embeddings for each graph node are obtained using a <b>Graph Neural Network</b> encoder, which builds local structural features via recursively aggregating features from each node's neighbors.</figcaption>
</center></figure>

<p>A GNN or Transformer encoder computes <strong>hiddden representations</strong> or embeddings for each node and/or edge in the input TSP graph. At each layer, nodes gather features from their neighbors to represent <strong>local graph structure</strong> via recursive message passing. Stacking $L$ layers allows the network to build representations from the $L$-hop neighborhood of each node.</p>

<p><strong>Anisotropic</strong> and <strong>attention-based GNNs</strong> such as Transformers [<a href="https://hanalog.polymtl.ca/wp-content/uploads/2018/11/cpaior-learning-heuristics-6.pdf">Deudon et al., 2018</a>, <a href="https://arxiv.org/abs/1803.08475">Kool et al., 2019</a>] and Gated Graph ConvNets [<a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a>] have emerged as the default choice for encoding routing problems. The attention mechanism during neighborhood aggregation is critical as it allows each node to weigh its neighbors based on their <strong>relative importance</strong> for solving the task at hand.</p>

<blockquote>
  <p>Importantly, the Transformer encoder can be seen as an attentional GNN, i.e. <a href="https://petar-v.com/GAT/">Graph Attention Network (GAT)</a>, on a fully-connected graph. See <a href="https://thegradient.pub/transformers-are-graph-neural-networks/">this blogpost</a> for an intuitive explaination.</p>
</blockquote>

<h3 id="3--4-converting-embeddings-into-discrete-solutions">(3 + 4) Converting embeddings into discrete solutions</h3>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/pipeline-3.png" width="70%" />
  <figcaption><b>Figure 5: Solution Decoding and Search:</b> Probabilities are assigned to each node or edge for <b>belonging to the solution set</b> (here, an MLP makes a prediction per edge to obtain a 'heatmap' of edge probabilities), and then converted into <b>discrete decisions</b> through classical graph search techniques such as greedy search or beam search.</figcaption>
</center></figure>

<p>Once the nodes and edges of the graph have been encoded into latent representations, we must decode them into discrete TSP solutions.
This is done via a two-step process: Firstly, probabilities are assigned to each node or edge for belonging to the solution set, either independent of one-another (i.e. <strong>Non-autoregressive decoding</strong>) or conditionally through graph traversal (i.e. <strong>Autoregressive decoding</strong>). Next, the predicted probabilities are converted into discrete decisions through classical <strong>graph search techniques</strong> such as greedy search or beam search guided by the probabilistic predictions (more on graph search later, when we discuss recent trends and future directions).</p>

<p>The choice of decoder comes with tradeoffs between <strong>data-efficiency</strong> and <strong>efficiency of implementation</strong>:
Autoregressive decoders [<a href="https://arxiv.org/abs/1803.08475">Kool et al., 2019</a>] cast TSP as a Seq2Seq or <strong>language translation task</strong> from a set of unordered cities to an ordered tour. They explicitly model the <strong>sequential inductive bias</strong> of routing problems through step-by-step selection of one node at a time. On the other hand, Non-autoregressive decoders [<a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a>] cast TSP as the task of producing <strong>edge probability heatmaps</strong>. The NAR approach is significantly faster and better suited for real-time inference as it produces predictions in <strong>one shot</strong> instead of step-by-step. However, it ignores the sequential nature of TSP, and may be less efficient to train when compared fairly to AR decoding [<a href="https://arxiv.org/abs/2006.07054">Joshi et al., 2021</a>].</p>

<h3 id="5-training-the-model">(5) Training the model</h3>

<p>Finally, the entire encoder-decoder model is trained in an <strong>end-to-end</strong> fashion, exactly like deep learning models for computer vision or natural language processing. In the simplest case, models can be trained to produce close-to-optimal solutions via <strong>imitating an optimal solver</strong>, i.e. via supervised learning. For TSP, the <strong>Concrode</strong> solver is used to generate labelled training datasets of optimal tours for millions of random instances. Models with AR decoders are trained via teacher-forcing to output the optimal sequence of tour nodes [<a href="https://arxiv.org/abs/1506.03134">Vinyals et al., 2015</a>], while those with NAR decoders are trained to identify edges traversed during the tour from non-traversed edges [<a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a>].</p>

<p>However, creating labelled datasets for supervised learning is an <strong>expensive</strong> and <strong>time-consuming process</strong>. Especially for very large problem instances, the exactness guarentees of optimal solvers may no longer materialise, leading to inexact solutions being used for supervised training. This is far from ideal from both practical and theoretical standpoints [<a href="https://arxiv.org/abs/2002.09398">Yehuda et al., 2020</a>].</p>

<p><strong>Reinforcement learning</strong> is a elegant alternative in the absence of groundtruth solutions, as is often the case for understudied problems. As routing problems generally require sequential decision making to <strong>minimize a problem-specific cost functions</strong> (e.g. the tour length for TSP), they can elegantly be cast in the RL framework which trains an agent to <strong>maximize a reward</strong> (the negative of the cost function). Models with AR decoders can be trained via standard policy gradient algorithms [<a href="https://arxiv.org/abs/1803.08475">Kool et al., 2019</a>] or Q-Learning [<a href="https://arxiv.org/abs/1704.01665">Khalil et al., 2017</a>].</p>

<hr />

<h2 id="characterizing-prominent-papers-via-the-pipeline">Characterizing Prominent Papers via the Pipeline</h2>

<p>We can characterize prominent works in deep learning for TSP through the 5-stage pipeline. Recall that the pipeline consists of: (1) Problem Definition → (2) Graph Embedding → (3) Solution Decoding → (4) Solution Search → (5) Policy Learning. Starting from the Pointer Networks paper by Oriol Vinyals and collaborators, the following <strong>table</strong> highlights in <span style="color:red">Red</span> the major innovations and contributions for several notable and early papers.</p>

<table>
  <thead>
    <tr>
      <th>Paper</th>
      <th>Definition</th>
      <th>Graph Embedding</th>
      <th>Solution Decoding</th>
      <th>Solution Search</th>
      <th>Policy Learning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="https://arxiv.org/abs/1506.03134">Vinyals et al., 2015</a></td>
      <td>Sequence</td>
      <td><span style="color:red">Seq2Seq</span></td>
      <td><span style="color:red">Attention (AR)</span></td>
      <td>Beam Search</td>
      <td>Immitation (SL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1611.09940">Bello et al., 2017</a></td>
      <td>Sequence</td>
      <td>Seq2seq</td>
      <td>Attention (AR)</td>
      <td>Sampling</td>
      <td><span style="color:red">Actor-critic (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1704.01665">Khalil et al., 2017</a></td>
      <td><span style="color:red">Sparse Graph</span></td>
      <td><span style="color:red">Structure2vec</span></td>
      <td>MLP (AR)</td>
      <td>Greedy Search</td>
      <td><span style="color:red">DQN (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://hanalog.polymtl.ca/wp-content/uploads/2018/11/cpaior-learning-heuristics-6.pdf">Deudon et al., 2018</a></td>
      <td>Full Graph</td>
      <td><span style="color:red">Transformer Encoder</span></td>
      <td>Attention (AR)</td>
      <td>Sampling + <span style="color:red">Local Search</span></td>
      <td>Actor-critic (RL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1803.08475">Kool et al., 2019</a></td>
      <td>Full Graph</td>
      <td><span style="color:red">Transformer Encoder</span></td>
      <td>Attention (AR)</td>
      <td>Sampling</td>
      <td><span style="color:red">Rollout (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a></td>
      <td>Sparse Graph</td>
      <td><span style="color:red">Residual Gated GCN</span></td>
      <td><span style="color:red">MLP Heatmap (NAR)</span></td>
      <td>Beam Search</td>
      <td>Immitation (SL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1911.04936">Ma et al., 2020</a></td>
      <td>Full Graph</td>
      <td>GCN</td>
      <td><span style="color:red">RNN + Attention (AR)</span></td>
      <td>Sampling</td>
      <td>Rollout (RL)</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="recent-advances-and-avenues-for-future-work">Recent Advances and Avenues for Future Work</h2>

<p>With the unified 5-stage pipeline in place, let us highlight some <strong>recent advances</strong> and <strong>trends</strong> in deep learning for routing problems. We will also provide some future research directions with a focus on improving generalization to large-scale and real-world instances.</p>

<h3 id="leveraging-equivariance-and-symmetries">Leveraging Equivariance and Symmetries</h3>

<p>One of the most influential early works, the autoregressive Attention Model [<a href="https://arxiv.org/abs/1803.08475">Kool et al., 2019</a>], considers TSP as a Seq2Seq language translation problem and sequentially constructs TSP tours as permutations of cities. One immediate drawback of this formulation is that it does not consider the <strong>underlying symmetries of routing problems</strong>.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/pomo.png" width="75%" />
  <figcaption><b>Figure 6:</b> In general, a TSP has one unique optimal solution (L). However, under the autoregressive formulation when a solution is represented as a sequence of nodes, <b>multiple optimal permutations</b> exist (R). (Source: <a href="https://arxiv.org/abs/2010.16011">Kwon et al., 2020</a>)</figcaption>
</center></figure>

<p><strong>POMO: Policy Optimization with Multiple Optima</strong> [<a href="https://arxiv.org/abs/2010.16011">Kwon et al., 2020</a>] proposes to leverage invariance to the starting city in the constructive autoregressive formulation. They train the same Attention Model, but with a new reinforcement learning algorithm (step 5 in the pipeline) which exploits the existence of multiple optimal tour permutations.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/equivariance.png" width="75%" />
  <figcaption><b>Figure 7:</b> TSP solutions remain unchanged under the <b>Euclidean symmtery group</b> of rotations, reflections, and translations to the city coordinates. Incorporating these symetries into the model may be a principled approach to tackling large-scale TSPs.</figcaption>
</center></figure>

<p>Similarly, a very recent ugrade of the Attention model by <a href="https://arxiv.org/abs/2110.03595">Ouyang et al., 2021</a> considers invariance with respect to <strong>rotations, reflections,</strong> and <strong>translations</strong> (i.e. the Euclidean symmetry group) of the input city coordinates. They propose an autoregressive approach while ensuring invariance by performing data augmentation during the problem definition stage (pipeline step 1) and using relative coordinates during graph encoding (pipeline step 2). Their approach shows particularly strong results on zero-shot generalization from random instances to the real-world TSPLib benchmark suite.</p>

<p>Future work may follow the <a href="https://geometricdeeplearning.com/"><strong>Geometric Deep Learning (GDL)</strong></a> blueprint for architecture design by using GNNs which are equivariant to Euclidean symmetries, e.g. E(n)-GNNs [<a href="https://arxiv.org/abs/2102.09844">Satorras et al, 2021</a>]. 
GDL tells us to explicitly think about and incorporate the symmetries and inductive biases that govern the data or problem at hand. As routing problems are <strong>embedded in euclidean coordinates</strong> and the <strong>routes are cyclical</strong>, incorporating these contraints directly into the model architectures or learning paradigms may be a principled approach to improving generalization to large-scale instances greater than those seen during training.</p>

<h3 id="improved-graph-search-algorithms">Improved Graph Search Algorithms</h3>

<p>Another influential research direction has been the one-shot non-autoregressive Graph ConvNet approach [<a href="https://arxiv.org/abs/1906.01227">Joshi et al., 2019</a>]. Several recent papers have proposed to retaining the same Gated GCN encoder (pipeline step 2) while replacing the beam search component (pipeline step 4) with <strong>more powerful</strong> and <strong>flexible graph search algorithms</strong>, e.g. Dynamic Programming [<a href="https://arxiv.org/abs/2102.11756">Kool et al., 2021</a>] or Monte-Carlo Tree Search (MCTS) [<a href="https://arxiv.org/abs/2012.10658">Fu et al., 2020</a>].</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/heatmaps.png" width="75%" />
  <figcaption><b>Figure 8:</b> The Gated GCN encoder <a href="https://arxiv.org/abs/1906.01227">[Joshi et al., 2019]</a> can be used to produce <b>edge prediction 'heatmaps'</b> (in transparent red color) for TSP, CVRP, and TSPTW. These can be further processed by <a href="https://arxiv.org/abs/2102.11756">DP</a> or <a href="https://arxiv.org/abs/2012.10658">MCTS</a> to output routes (in solid colors). The GCN essentially reduces the solution search space for sophisticated search algorithms which may have been intractable when searching over all possible routes. (Source: <a href="https://arxiv.org/abs/2102.11756">Kool et al., 2021</a>)</figcaption>
</center></figure>

<p>The <a href="https://arxiv.org/abs/2012.10658">GCN + MCTS framework</a> by Fu et al. in particular has a very interesting approach to <strong>training models efficiently on trivially small TSP</strong> and successfully <strong>transferring the learnt policy to larger graphs</strong> in a zero-shot fashion (something that the original GCN + Beam Search by Joshi et al. struggled with). They ensure that the predictions of the GCN encoder generalize from small to large TSP by updating the problem definition (pipeline step 1): large problem instances are represented as many smaller sub-graphs which are of the same size as the training graphs for the GCN, and then merge the GCN edge predictions before performing MCTS.</p>

<p>Overall, this line of work suggests that <strong>stronger coupling</strong> between the design of both the <strong>neural</strong> and <strong>symbolic/search</strong> components of models is essential for out-of-distribution generalization [<a href="https://arxiv.org/abs/2003.00330">Lamb et al., 2020</a>]. However, it is also worth noting that designing highly customized and parallelized implementations of graph search on GPUs may be challenging for each new problem.</p>

<blockquote>
  <p>On a tangential note, Yoshua Bengio recently had a <a href="https://www.facebook.com/yoshua.bengio/posts/4220503864721188">very interesting take</a> on hybridization of neural and symbolic AI.</p>
</blockquote>

<h3 id="learning-to-improve-sub-optimal-solutions">Learning to Improve Sub-optimal Solutions</h3>

<p>Recently, a number of papers have explored an alternative to constructive AR and NAR decoding schemes which involves <strong>learning to iteratively improve (sub-optimal) solutions</strong> or <strong>learning to perform local search</strong>, starting with <a href="https://arxiv.org/abs/1912.05784">Wu et al., 2021</a>. Other notable papers include the works of <a href="https://arxiv.org/abs/2004.01608">da Costa et al., 2020</a>, <a href="https://arxiv.org/abs/2110.02544">Ma et al., 2021</a>, <a href="https://arxiv.org/abs/2110.07983">Xin et al., 2021</a>, and <a href="https://arxiv.org/abs/2110.05291">Hudson et al., 2021</a>.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/cyclic-pe.png" width="75%" />
  <figcaption><b>Figure 9:</b> Architectures which learn to improve sub-optimal TSP solutions by guiding decisions within local search algorithms. (a) The original Transformer encoder-decoder architecture <a href="https://arxiv.org/abs/1912.05784">[Wu et al., 2021]</a> which used <b>sinusoidal positional encodings</b> to represent the current sub-optimal tour permutation; (b) <a href="https://arxiv.org/abs/2110.02544">Ma et al., 2021</a>'s upgrade through the lens of symmetry: the Dual-aspect Transformer encoder-decoder  with <b>learnable positional encodings</b> which capture the cyclic nature of TSP tours; (c) Visualizations of sinusoidal vs. cyclical positional encodings.</figcaption>
</center></figure>

<p>In all these works, since deep learning is used to <strong>guide decisions</strong> within classical local search algorithms (which are designed to work regardless of problem scale), this approach implicitly leads to <strong>better zero-shot generalization</strong> to larger problem instances compared to the constructive approaches. This is a very desirable property for practical implementations, as it may be intractable to train on very large or real-world TSP instances. Notably, <strong>NeuroLKH</strong> [<a href="https://arxiv.org/abs/2110.07983">Xin et al., 2021</a>] uses edge probability heatmaps produced via GNNs to improve the <strong>classical Lin-Kernighan-Helsgaun algorithm</strong> and demonstrates strong zero-shot generalization to TSP with 5000 nodes as well as across TSPLib instances.</p>

<blockquote>
  <p>For the interested reader, DeepMind’s <a href="https://arxiv.org/abs/2105.02761">Neural Algorithmic Reasoning</a> research program offers a unique meta-perspective on the intersection of neural networks with classical algorithms.</p>
</blockquote>

<p>A limitation of this line of work is the prior need for <strong>hand-designed local search algorithms</strong>, which may be missing for novel or understudied problems. On the other hand, constructive approaches are arguably easier to adapt to new problems by enforcing constraints during the solution decoding and search procedure.</p>

<h3 id="learning-paradigms-that-promote-generalization">Learning Paradigms that Promote Generalization</h3>

<p>Future work could look at <strong>novel learning paradigms</strong> which explicitly focus on generalization beyond supervised and reinforcement learning. At present, most papers propose to train models efficiently on trivially small and random TSPs, then transfer the learnt policy to larger graphs and real-world instances in a zero-shot fashion. The logical next step is to fine-tune the model on a small number of specifc problem instances. Thus, it will be interesting to explore <strong>fine-tuning as a meta-learning problem</strong>, wherein the goal is to train model parameters specifically for fast adaptation to new data distributions and problems.</p>

<p>Another interesting direction could explore <strong>tackling understudied routing problems</strong> with challenging constraints via multi-task pre-training on well-known routing problems such as TSP and CVPR, followed by problem-specific finetuning. Similar to <strong>language modelling as a pre-training objective</strong> in <a href="https://ruder.io/nlp-imagenet/">Natural Language Processing</a>, the goal of pre-training for routing would be to learn generally useful latent representations that can transfer well to novel routing problems.</p>

<h3 id="improved-evaluation-protocols">Improved Evaluation Protocols</h3>

<p>Beyond algorithmic innovations, there have been repeated calls from the community for <strong>more realistic evaluation protocols</strong> which can lead to advances on real-world routing problems and adoption by industry [<a href="https://arxiv.org/abs/1909.13121">Francois et al., 2019</a>, <a href="https://arxiv.org/abs/2002.09398">Yehuda et al., 2020</a>]. Most recently, <a href="https://arxiv.org/abs/2109.13983">Accorsi et al., 2021</a> have provided an authoritative set of <strong>guidelines for experiment design</strong> and <strong>comparissons</strong> to classical Operations Research (OR) techniques. They hope that fair and rigourous comparissons on <strong>standardized benchmarks</strong> will be the first step towards the integration of deep learning techniques into industrial routing solvers.</p>

<figure><center>
  <img src="https://iclr.iro.umontreal.ca/92ee5ed8-dfe1-48e9-9dc1-022b0264df52_1641915272/public/images/2021-12-01-deep-learning-for-routing-problems/ml4co.png" width="25%" />
  <figcaption><b>Figure 10:</b> Community contests such as <a href="https://www.ecole.ai/2021/ml4co-competition/">ML4CO</a> are a great initiative to track progress. (Source: ML4CO website).</figcaption>
</center></figure>

<p>In general, it is encouraging to see recent papers <strong>embrace real-world benchmarks</strong> such as <a href="http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/">TSPLib</a> and <a href="http://vrp.atd-lab.inf.puc-rio.br/index.php/en/">CVPRLib</a>. Such routing problem collections contain graphs from cities and road networks around the globe along with their exact solutions, and have become the standard testbed for new solvers in the OR community. At the same time, we must also be vary to not ‘overfit’ on the top <code class="language-plaintext highlighter-rouge">n</code> TSPLib instances that every other paper is using. <strong>Regular competitions</strong> on freshly curated real-world datasets, such as the <a href="https://www.ecole.ai/2021/ml4co-competition/">ML4CO competition at NeurIPS 2021</a>, are another great initiative to track progress in the intersection of deep learning and routing problems.</p>

<blockquote>
  <p>We highly recommend the engaging panel discussion and talks from ML4CO, NeurIPS 2021, available on <a href="https://youtube.com/playlist?list=PLYWmzh0Y6EOZz3PtMxfaqEnRsfW-TF4nf">YouTube</a>.</p>
</blockquote>

<hr />

<h2 id="summary">Summary</h2>

<p>This blogpost presents a <strong>neural combinatorial optimization pipeline</strong> that unifies recent papers on deep learning for routing problems into a single framework. Through the lens of our framework, we then analyze and disect recent advances, and speculate on directions for future research.</p>

<p>The following table highlights in <span style="color:red">Red</span> the major innovations and contributions for recent papers covered in the previous sections.</p>

<table>
  <thead>
    <tr>
      <th>Paper</th>
      <th>Definition</th>
      <th>Graph Embedding</th>
      <th>Solution Decoding</th>
      <th>Solution Search</th>
      <th>Policy Learning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="https://arxiv.org/abs/2010.16011">Kwon et al., 2020</a></td>
      <td>Full Graph</td>
      <td>Transformer Encoder</td>
      <td>Attention (AR)</td>
      <td>Sampling</td>
      <td><span style="color:red">POMO Rollout (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2012.10658">Fu et al., 2020</a></td>
      <td><span style="color:red">Sparse Sub-graphs</span></td>
      <td>Residual Gated GCN</td>
      <td>MLP Heatmap (NAR)</td>
      <td><span style="color:red">MCTS</span></td>
      <td>Immitation (SL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2102.11756">Kool et al., 2021</a></td>
      <td>Sparse Graph</td>
      <td>Residual Gated GCN</td>
      <td>MLP Heatmap (NAR)</td>
      <td><span style="color:red">Dynamic Programming</span></td>
      <td>Immitation (SL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2110.03595">Ouyang et al., 2021</a></td>
      <td>Full Graph + <span style="color:red">Data Augmentation</span></td>
      <td><span style="color:red">Equivariant GNN</span></td>
      <td>Attention (AR)</td>
      <td>Sampling + Local Search</td>
      <td><span style="color:red">Policy Rollout (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/1912.05784">Wu et al., 2021</a></td>
      <td>Sequence + <span style="color:red">Position</span></td>
      <td>Transformer Encoder</td>
      <td><span style="color:red">Transformer Decoder (L2I)</span></td>
      <td>Local Search</td>
      <td>Actor-critic (RL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2004.01608">da Costa et al., 2020</a></td>
      <td>Sequence</td>
      <td>GCN</td>
      <td><span style="color:red">RNN + Attention (L2I)</span></td>
      <td>Local Search</td>
      <td>Actor-critic (RL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2110.02544">Ma et al., 2021</a></td>
      <td>Sequence + <span style="color:red">Cyclic Position</span></td>
      <td><span style="color:red">Dual Transformer Encoder</span></td>
      <td><span style="color:red">Dual Transformer Decoder (L2I)</span></td>
      <td>Local Search</td>
      <td><span style="color:red">PPO (RL)</span></td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2110.07983">Xin et al., 2021</a></td>
      <td>Sparse Graph</td>
      <td>GAT</td>
      <td>MLP Heatmap (NAR)</td>
      <td><span style="color:red">LKH Algorithm</span></td>
      <td>Immitation (SL)</td>
    </tr>
    <tr>
      <td><a href="https://arxiv.org/abs/2110.05291">Hudson et al., 2021</a></td>
      <td><span style="color:red">Sparse Dual Graph</span></td>
      <td>GAT</td>
      <td>MLP Heatmap (NAR)</td>
      <td><span style="color:red">Guided Local Search</span></td>
      <td>Immitation (SL)</td>
    </tr>
  </tbody>
</table>

<hr />

<p>As a final note, we would like to say that the <strong>more profound motivation</strong> of neural combinatorial optimization may NOT be to outperform classical approaches on well-studied routing problems. Neural networks may be used as a general tool for <strong>tackling previously un-encountered NP-hard problems</strong>, especially those that are non-trivial to design heuristics for. We are excited about recent applications of neural combinatorial optimization for <a href="https://www.nature.com/articles/s41586-021-03544-w">designing computer chips</a>, <a href="https://arxiv.org/abs/2109.10883">optimizing communication networks</a>, and <a href="https://openreview.net/forum?id=1QxveKM654">genome reconstruction</a>, and are looking forward to more in the future!</p>


</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep-learning </a>
  
    <a class="content-tag" href="/tags/#graph-neural-networks"> graph-neural-networks </a>
  
    <a class="content-tag" href="/tags/#combinatorial-optimization"> combinatorial-optimization </a>
  
    <a class="content-tag" href="/tags/#travelling-salesperson-problem"> travelling-salesperson-problem </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep-learning </a>
  
    <a class="content-tag" href="/tags/#graph-neural-networks"> graph-neural-networks </a>
  
    <a class="content-tag" href="/tags/#combinatorial-optimization"> combinatorial-optimization </a>
  
    <a class="content-tag" href="/tags/#travelling-salesperson-problem"> travelling-salesperson-problem </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
