<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      From Molecular Dynamics to MeshGraphNets &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/2021/12/01/meshgraphnets/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">From Molecular Dynamics to MeshGraphNets</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#gnn"> GNN </a>
  
    <a class="content-tag" href="/tags/#graph-network"> Graph Network </a>
  
    <a class="content-tag" href="/tags/#mesh-based-simulations"> Mesh-based simulations </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p>Blog post on the paper “Learning Mesh-Based Simulation with Graph Networks” aka MeshGraphNets (<cite><a href="https://arxiv.org/abs/2010.03409">Pfaff et al., 2021</a></cite>).</p>

<p>Reference <a href="https://openreview.net/forum?id=roNqYL0_XP">paper</a>, <a href="https://sites.google.com/view/meshgraphnets">website</a> and <a href="https://github.com/deepmind/deepmind-research/tree/master/meshgraphnets">github</a>.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/TimelineBanner.svg" alt="Timeline Banner" style="display:block;margin-left:auto;margin-right:auto;width:100%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Historical timeline of methods leading up to MeshGraphNets.</figcaption>
</figure>

<hr />

<p>Numerical solvers that model complex physical systems, such as fluids, elastic materials, electromagnetic fields, etc., are based on the discretization of the underlying partial differential equations (PDEs). They use methods such as Finite Elements, Finite Volumes, Finite Differences, etc. which are computationally expensive and require complete knowledge of the governing PDE and boundary conditions. In some cases, free parameters may be estimated from experimental data, but this requires solving ill-posed inverse problems which requires even more computation.</p>

<p>Over the last few years, Deep Learning has emerged as an alternative to classical methods, providing faster prediction and allowing for quick exploration of large design spaces. Deep Learning is especially well suited for problems, in which the physics (PDEs and boundary conditions) is only partially known, and must be integrated with sparse measurement data. However, achieving the same high accuracy of classical methods has proven to be challenging.</p>

<p>A core reason for this gap in accuracy is the emphasis classical methods, such as Finite Elements, place on their mesh discretization. Classical methods use dynamic mesh adaptation as the basis for both a priori and a posteriori error control algorithms, that guarantee accurate simulation results. Despite their importance in computational mechanics, nonuniform or unstructured meshes have not received much attention in Machine Learning for Physics, until DeepMind proposed the MeshGraphNets framework. This framework uses Graph Networks to simulate dynamical systems with adaptive mesh representations. It learns an input-output map over the initial mesh, which can then be remeshed adaptively during rollout, to increase the mesh resolution in areas with large residuals.</p>

<p>In this blog, we discuss the MeshGraphNets paper and its predecessor paper through the lens of the graph-learning paradigm. We claim that molecular dynamics and smoothed particle hydrodynamics are the ancestors of all graph-based, learned particle simulators and show how graph-based approaches naturally extend to meshes. Then, we compare MeshGraphNets to other approaches, both graph-based and not. Last but not least, we conclude by presenting the strengths and weaknesses of the model, directions for future work, and a code snippet of the core algorithm written in JAX.</p>

<hr />
<h2 id="table-of-contents">Table of Contents</h2>
<ul>
  <li><a href="#notation">Notation</a></li>
  <li><a href="#basics-graph-networks">Basics: Graph Networks</a></li>
  <li><a href="#meshgraphnets-and-its-predecessor">MeshGraphNets and its Predecessor</a>
    <ul>
      <li><a href="#graph-network-based-simulators-gns">Graph Network-based Simulators (GNS)</a></li>
      <li><a href="#meshgraphnets">MeshGraphNets</a></li>
    </ul>
  </li>
  <li><a href="#from-md-to-gns">From MD to GNS</a>
    <ul>
      <li><a href="#molecular-dynamics">Molecular Dynamics</a></li>
      <li><a href="#smoothed-particle-hydrodynamics">Smoothed Particle Hydrodynamics</a></li>
      <li><a href="#comparison">Comparison</a></li>
    </ul>
  </li>
  <li><a href="#from-gns-to-meshgraphnets">From GNS to MeshGraphNets</a>
    <ul>
      <li><a href="#meshes">Meshes</a></li>
      <li><a href="#fluid-particle-model">Fluid-Particle Model</a></li>
      <li><a href="#time-extrapolation-and-training-noise">Time Extrapolation and Training Noise</a></li>
      <li><a href="#remeshing">Remeshing</a></li>
    </ul>
  </li>
  <li><a href="#comparison-to-related-work">Comparison to Related Work</a>
    <ul>
      <li><a href="#graph-network-based">Graph Network-based</a></li>
      <li><a href="#comparable-paradigms">Comparable Paradigms</a></li>
    </ul>
  </li>
  <li><a href="#summary-and-practical-considerations">Summary and Practical Considerations</a></li>
  <li><a href="#references">References</a></li>
</ul>

<hr />

<h1 id="notation">Notation</h1>

<p>Our nomenclature follows the MeshGraphNets paper, with some minor modifications and extensions. We underline the distinction between mesh and graph attributes (not to be confused with mesh-space and world-space!). In the paper, mesh nodes and edges have the same notation as graph nodes and edges. We believe this abuse of notation might be confusing to some readers and thus put a hat ($\hat{\square}$) above the mesh nodes and edges.</p>

<table>
  <thead>
    <tr><th>Symbol</th>
      <th>Meaning</th></tr>
  </thead>
  <tbody>
    <tr style="border-top:2px solid black"><th colspan="2" text-align="center">Mesh</th></tr>
    <tr><td>$M^t = (\hat{V}, \hat{E}^M)$</td>
      <td>Mesh at time $t$.</td></tr>
    <tr><td>$\hat{\mathbf{v}}_i \in \hat{V}$</td>
      <td>Set of graph nodes. Contains: <br />
      1) <i> Lagrangian</i>: $\{\mathbf{u}_{i}, \mathbf{x}_{i}, \mathbf{q}_{i}, \mathbf{n}_{i} \}$. <br />
      2) <i> Eulerian</i>: $\{\mathbf{u}_{i}, \mathbf{q}_{i}, \mathbf{n}_i\}$. <br /> </td></tr>    
    <tr><td>$\hat{\mathbf{e}}_{ij}^M \in \hat{E}^M$</td>
      <td>Mesh edges representing the connections, i.e. adjacency matrix. </td></tr> 
    <tr><td>$\mathbf{u}_i \in U, \mathbf{u}_{ij}, \left|  \mathbf{u}_{ij} \right|$</td>
      <td>Mesh-space coordinate, displacement $\mathbf{u}_{ij}=\mathbf{u}_{i}-\mathbf{u}_{j}$, and its norm.</td></tr>
    <tr><td>$\mathbf{x}_i \in X, \mathbf{x}_{ij}, \left|  \mathbf{x}_{ij} \right|$</td>
      <td>World-space coordinate, displacement $\mathbf{x}_{ij}=\mathbf{x}_{i}-\mathbf{x}_{j}$, and its norm.</td></tr>
    <tr><td>$\mathbf{q}_i$</td>
      <td>Dynamical features, e.g. velocity in <i> Eulerian </i> systems, momentum, density.</td></tr>
    <tr><td>$\mathbf{n}_i$</td>
      <td>Node type as one-hot encoding, e.g. for boundary conditions. </td></tr>
    <tr><td>$\tilde{\square}$</td>
      <td>Simulated quantity including errors. </td></tr>
    <tr style="border-top:2px solid black"><th colspan="2" text-align="center">Graph</th></tr>
    <tr><td>$G=(V, E^M, E^W)$</td>
      <td>Multigraph. $E^W$ only in <i>Lagrangian</i> systems.</td></tr>
    <tr><td>$\mathbf{v}_i \in V$</td>
      <td>Graph node. Contains MLP encoding $\epsilon^V(\{\mathbf{q}_i, \mathbf{n}_i \})$. In the cloth experiment $(\mathbf{x}^t_i-\mathbf{x}^{t-1}_i)$ is included as well.</td></tr>
    <tr><td>$\mathbf{e}_{ij} \in E$</td>
      <td>General graph edge.</td></tr> 
    <tr><td>$\mathbf{e}_{ij}^M \in E^M$</td>
      <td>Mesh edge modeling internal dynamics. Contains MLP encoding: <br />
      1) <i> Lagrangian</i>: $\epsilon^M(\{\mathbf{u}_{ij}, \left|\mathbf{u}_{ij}\right|, \mathbf{x}_{ij}, \left|\mathbf{x}_{ij}\right| \})$. <br />
      2) <i> Eulerian</i>: $\epsilon^M(\{\mathbf{u}_{ij}, \left|\mathbf{u}_{ij}\right|\})$. <br /> </td></tr>    
    <tr><td>$\mathbf{e}_{ij}^W \in E^W$</td>
      <td>World edges modeling external dynamics, e.g. collision, contact. Contains an MLP encoding $\epsilon^W(\{ \mathbf{x}_{ij}, \left|\mathbf{x}_{ij}\right| \})$. Applicable only if: <br /> 1) <i> Lagrangian </i> system. <br />  2) $\left|\mathbf{x}_{ij}\right| &lt; r_W$ with interaction radius $r_W$.  <br />  3) Excluding nodes already connected in the mesh. </td></tr>
    <tr><td>$\epsilon^V, \epsilon^M, \epsilon^W$</td>
      <td>MLPs used to encode $\mathbf{v}_i, \mathbf{e}_{ij}^M$ and $\mathbf{e}_{ij}^W$. Output size 128.</td></tr>
    <tr><td>$f^V, f^E, f^M, f^W$</td>
      <td>MLPs used to update nodes, general edges, mesh edges and world edges in each GN layer.</td></tr>
    <tr><td>$\delta^V$</td>
      <td>MLP used as decoder.</td></tr>
    <tr><td>$\mathbf{p}_i$</td>
      <td>Output feature vector. Contains one or more of: <br />
      1) Spatial derivative, e.g. velocity, acceleration. <br />
      2) Derivates of $\mathbf{q}_i$. <br />
      2) Some additional learned quantity, e.g. pressure, stress, sizing field.</td></tr>
    <tr><td>$\mathbf{y}_i \in Y$</td>
      <td>Output feature vector containing spatial derivatives (special case of $\mathbf{p}_i$).</td></tr>
    <tr><td>$r_k, s_k$</td>
      <td>Receiver and sender w.r.t. node $k$.</td></tr>
  </tbody>
</table>

<hr />

<h1 id="basics-graph-networks">Basics: Graph Networks</h1>

<p>Graph Networks (<abbr title="Graph Network">GN</abbr>s) (<cite><a href="https://arxiv.org/abs/1806.01261">Battaglia et al., 2018</a></cite>) is a framework that generalizes graph based learning, and specifically the Graph Neural Network (<abbr title="Graph Neural Network">GNN</abbr>) architecture by <cite><a href="https://ieeexplore.ieee.org/document/4700287">Scarselli et al., 2008</a></cite>. It defines a class of functions that take a graph as input, perform computations such as convolution, and again output a graph.</p>

<p>In the original GN paper, the input was defined as a 3-tuple, with one of the 3 elements being a global feature. In MeshGraphNets and its predecessor, the authors made an algorithmic simplification by concatenating this global feature to the node features, resulting in the following 2-tuple:</p>

\[\begin{align*}
G = (V,E)
\end{align*}\]

<p>$V$ is the set of node features which describe usual node features and the aforementioned global feature. Usual node features in the case of particle dynamics could be: position, velocity, mass, etc. The global feature could be anything from the potential energy of the system, through material properties, such as density or viscosity, to the gravitational constant.</p>

\[\begin{align*}
V = \{\mathbf{v}_{i}\}_{i=1:N^v}
\end{align*}\]

<p>Finally, $E$ is the set of edge features between the connected nodes. If we denote the edge features by $\mathbf{e}_k$ and the index of the receiver and sender nodes by $r_k$ and $s_k$, we get:</p>

\[\begin{align*}
E = \{ (\mathbf{e}_{k}, r_{k},s_{k}) \}_{k=1:N^e}.
\end{align*}\]

<p><strong>Algorithm</strong></p>

<p>We want to briefly discuss the GNs algorithm as used in MeshGraphNets, i.e. without explicit global features.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/GN_pseudocode.png" alt="Graph Network Block" style="display:block;margin-left:auto;margin-right:auto;width:70%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.1: Graph Network block pseudocode (Adapted from: <cite><a href="https://arxiv.org/abs/1806.01261">Battaglia et al., 2018</a></cite>).</figcaption>
</figure>

<p>The main computations are the <strong>update functions</strong> $f$ and the <strong>aggregation function</strong> $\textrm{aggr}^E$. The update functions encode nodes and edges one-by-one and return some updated features. The aggregator combines the updated edge representation along all edges connected to a given node in the graph.</p>

<p><em>In the spirit of a ‘general framework’, it is a matter of choice which functions to use for update and aggregation.</em></p>

<p>For the most part, MLPs are used for the updates, while the summation function (or any other permutation-invariant function like min, max or average) is used to aggregate the updated entities.</p>

<p>Next, let’s look at <em>what makes GNs special in building physics-informed learning models?</em></p>

<p><strong>Physical biases</strong></p>

<ol>
<li> Graph-based algorithms extend conventional state-of-the-art models like the CNN, which works only on regular grids, to non-regular grids. </li>
<li>To learn physical systems one would have to consider physical laws. These may include:
  <ul>
    <li>Spatial Equivariance/Invariance,</li>
    <li>Local Interactions,</li>
    <li>Superposition Principle,</li>
    <li>Differential Equation.</li>
  </ul>
</li>
</ol>

<p>By default, most of these principles are captured, since the framework represents information using concepts from graph theory. For example, by using graphs, we get permutation and equivariance for free and also constrain interactions to local neighborhoods. Other principles, like translation invariance, can be easily incorporated by using the relative position between neighboring nodes instead of their absolute coordinates. Concerning the superposition principle of forces in the context of particle-based simulations, using summation aggregation over the representation of forces (edge feature) makes more sense than averaging or choosing the largest.</p>

<p>Helpful references on the topic of Geometric Deep Learning with further details on graphs, including invariance groups and physical intuitions, are the recent review by <cite><a href="https://arxiv.org/abs/2104.13478">Bronstein et al., 2021</a></cite>, and the PhD thesis of <cite><a href="https://dare.uva.nl/search?identifier=0f7014ae-ee94-430e-a5d8-37d03d8d10e6">Cohen, 2021</a></cite>.</p>

<hr />

<h1 id="meshgraphnets-and-its-predecessor">MeshGraphNets and its Predecessor</h1>

<p>In the following section, we build up the argument that MeshGraphNets originate from molecular dynamics. However, the connection becomes clearer if we take a step back and look at MeshGraphNets’ predecessor - GNS.</p>

<h2 id="graph-network-based-simulators-gns">Graph Network-based Simulators (GNS)</h2>

<p>The paper “Learning to Simulate Complex Physics with Graph Networks” by <cite><a href="http://proceedings.mlr.press/v119/sanchez-gonzalez20a.html">Sanchez-Gonzalez et al., 2020</a></cite> uses an encoder-processor-decoder architecture to define the “Graph Network-based Simulators” (GNS) framework. With its iterative application of Graph Networks, this work improves upon the 4 years older “Interaction Networks for Learning about Objects, Relations and Physics” by <cite><a href="https://proceedings.neurips.cc/paper/2016/hash/9657c1fffd38824e5ab0472e022e577e-Abstract.html">Battaglia et al., 2016</a></cite>, where the ability of Graph Networks to learn the physics of particle collisions was shown for the first time. GNS showed that GN-based approaches can be used for particle dynamics on large scale systems (up to 85k particles). The working principle is summarized as follows.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/GNS_scheme.png" alt="Graph Network Block" style="display:block;margin-left:auto;margin-right:auto;width:100%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.2: GNS scheme (Image source: <cite><a href="http://proceedings.mlr.press/v119/sanchez-gonzalez20a.html">Sanchez-Gonzalez et al., 2020</a></cite>).</figcaption>
</figure>

<p>$X^{t}$ represents the state of the particle system at time $t$.
With an initial time of $t_0$ and final time $t_K$, the dynamics of the system can be represented as</p>

\[\textbf{X}^{t_{0:K}} = \{X^{t_0},X^{t_1},...X^{t_K}  \}.\]

<p>The task is then to learn the differential operator  $ d_{\theta}$, which approximates the dynamics:</p>

\[d_{\theta} : X^{t_k} \rightarrow Y^{t_k}\]

\[X^{t_{k+1}} = \text{Update} \{ X^{t_k}, d_{\theta} \}\]

<p>To do so the encoder-processor-decoder architecture is utilized:</p>

\[\text{Encoder} \rightarrow \text{Processor} \rightarrow \text{Decoder}\]

<p>of which we break down every single component, as well as the loss and the update-step used at inference time.</p>

<p><strong>Encoder:</strong>
Takes as input the particles $X^{t_k}$ at time $t_k$ (and potentially a history $h$ of somewhere between $1$ and $10$ previous states, which we omit in the equation below for simplicity) and encodes them into a graph.</p>

\[G^{0} = \text{Encoder}(X)\]

<p>Here, $G = (V,E)$. The paper implements the global features as part of the node features, thus no global features appear explicitly in $G$. The edges are obtained by connecting nodes that are within some interaction radius $r_W$ of each other.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/gn_enc.png" alt="GNS Encoder" style="display:block;margin-left:auto;margin-right:auto;width:90%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.3: GNS Encoder.</figcaption>
</figure>

<p><strong>Processor:</strong>
It’s a multilayer Graph Network. The exact number of layers is an hyperparameter; 5-6 worked well for this paper. This GN performs message passing.</p>

\[G^{l} = \text{GN}^l (G^{l-1})\]

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/gn_pr.png" alt="GNS Processor" style="display:block;margin-left:auto;margin-right:auto;width:90%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.4: GNS Processor.</figcaption>
</figure>

<p><strong>Decoder:</strong>
Lastly, the output graph $G^{L}$ is decoded back to physical space, in which it represents the acceleration of particles.</p>

\[\tilde{\ddot{X}} = Y=\text{Decoder}(G^L)\]

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/gn_dec.png" alt="GNS Decoder" style="display:block;margin-left:auto;margin-right:auto;width:90%;" />
  <figcaption style="text-align: center; opacity:70%;font-style: italic;">Fig.5: GNS Decoder.</figcaption>
</figure>

<p><strong>Loss:</strong>
During training, the loop will not <strong>update</strong> to the subsequent state $\tilde{X}$ because we are interested in approximating the acceleration directly. The $\tilde{\square}$ is used to denote that the quantity is the output of a learned model and contains errors. Thus, we compute the loss w.r.t. the target acceleration $\mathbf{p}^{t}$.</p>

\[\text{Loss} = \text{MSE}\left(\tilde{\ddot{X}}, \mathbf{p}^{t}\right)\]

<p><strong>Update:</strong>
This extra step is only used during inference. In the simplest case, the velocity $\tilde{\dot{X}}$ is computed from the acceleration $\tilde{\ddot{X}}$ using Forward Euler integration, after which the integration is repeated for the velocity to obtain the new position $\tilde{X}^{t+1}$. In the equations below, we see that the network simply learns the force acting on a Newtonian particle, whose trajectory is integrated with time step $\delta t=1$ (this $\delta t$ is not physical time as the physical time is imposed by the time between training samples; to see the equivalence between the first equation and Newtonian mechanics, consider $a=\tilde{\ddot{X}}$, $v^t= \tilde{\dot{X}}^{t}$ and $v^{t+\delta t}=\tilde{\dot{X}}^{t+1}$).</p>

\[\begin{align}
\tilde{\dot{X}}^{t+1} &amp;= \tilde{\dot{X}}^{t} +  \tilde{\ddot{X}}^{t}  \qquad \Longleftrightarrow \qquad F = m a =m \frac{d v}{d t} \approx m \frac {v^{t+\delta t}-v^t}{\delta t} \\

\tilde{X}^{t+1} &amp;= \tilde{X}^{t} +  \tilde{\dot{X}}^{t+1}
\end{align}\]

<p><br /></p>

<p>The GNS model shows good performance on fluid simulations or fluid solid interactions. However, it fails to adequately model deforming meshes such as thin shells. This motivated the follow-up work MeshGraphNets.</p>

<h2 id="meshgraphnets">MeshGraphNets</h2>

<p>The MeshGraphNets framework was presented in the paper “Learning Mesh-Based Simulation with Graph Networks” by <cite><a href="https://openreview.net/forum?id=roNqYL0_XP">Pfaff et al., 2021</a></cite> at ICLR 2021 in which they extended the GNS framework by supplementing the GNS’ Euclidean spatial coordinates with a set of edges to define a mesh upon which the different interactions can be learned. Thus, the problem of finding a universal interaction function is split into two separate problems: finding the interaction of “mesh” type and “collision” type edges. Mathematicians and Physicists talk about the <a href="https://en.wikipedia.org/wiki/Superposition_principle">superposition principle</a>, i.e. splitting a complicated function into the sum of multiple simpler ones, which is precisely what was done here.</p>

<p>One further contribution of the paper lies in its introduction of adaptive remeshing (common engineering practice) to Graph Networks. This allows MeshGraphNets to model a wider dynamics-spectrum, as shown by these cases, which cannot be modeled with the GNS framework:</p>

<ol>
  <li>If a mesh deforms, such that two distant nodes on the mesh meet in real space, then GNS will most probably become unstable and diverge, see <a href="https://sites.google.com/view/meshgraphnets#h.vip6m89fxz2">video</a>.</li>
  <li>If a mesh deforms dynamically and we omit adaptive remeshing, then errors accumulate fast due to missing high frequency information.</li>
  <li>Extending the output feature vector from $\mathbf{y}_i$ to $\mathbf{p}_i$ (see <a href="#notation">Notation</a>), containing additional auxiliary variables, allows to train the algorithm to predict more than just locations, e.g. stress field, and thus extending the use cases.</li>
</ol>

<p>Next, we briefly summarize the working principles of MeshGraphNets.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/MeshGraphNets_scheme.png" alt="MeshGraphNets" style="display:block;margin-left:auto;margin-right:auto;width:100%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.6: MeshGraphNets scheme (Image source: <cite><a href="https://openreview.net/forum?id=roNqYL0_XP">Pfaff et al., 2021</a></cite>).</figcaption>
</figure>

<p>The similarity to the GNS pipeline is apparent (see Fig.2). Here, we only highlight the differences between the two methods.</p>

<p><strong>Encoder:</strong>
Takes as additional input a predefined mesh or, during inference, the mesh from the last iteration. Then, if desired, remeshing is applied. Finally, when constructing the world graph (equivalent to the GNS graph), nodes already connected on the mesh are excluded.</p>

<p><strong>Processor:</strong>
Same as GNS.</p>

<p><strong>Decoder:</strong>
Output feature vector $\mathbf{p}_i$ contains additional entries like derivatives of dynamical features $\mathbf{q}_i$ or other quantities of interest like pressure.</p>

<p><strong>Loss:</strong>
In contrast to GNS, the output is extended to the more general formulation with any number of additional output features. But as long as the corresponding training data is available, the L2 loss is again computed for the one time step prediction.</p>

<p><strong>Update:</strong>
Same as GNS for Lagrangian systems. Marginal modification regarding integration of arbitrary further features.</p>

<p><br />
<em>‘<del>A picture</del> Pseudocode is worth a thousand words (and equations).’</em></p>

<p><br />
Following this theme, instead of repeating what is already in the paper (<cite><a href="https://openreview.net/forum?id=roNqYL0_XP">Pfaff et al., 2021</a></cite>), we invite the reader to go through the pseudocode below.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/MGN_pseudocode.png" alt="GNS Decoder" style="display:block;margin-left:auto;margin-right:auto;width:100%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.7: MeshGraphNets pseudocode.</figcaption>
</figure>

<p>For the sake of simplicity, in this pseudocode no previous states are required, i.e. zero history ($h=0$). This setting is true for all experiments except the <a href="https://sites.google.com/view/meshgraphnets#h.bf3836crk5pc">cloth experiment</a>, in which one previous state ($h=1$) is used to estimate the velocities. In terms of the pseudocode, this means to change the input to the Encoder from $M^{t_i}$ to \(\{ M^{t_{i-h}},...M^{t_i} \}\).</p>

<p><br />
<em>‘Now, you know MeshGraphNets. Well, then: Why molecular dynamics?’</em></p>

<p><br /></p>

<hr />

<h1 id="from-md-to-gns">From MD to GNS</h1>

<p>In this section, we show the similarities between MeshGraphNets, its predecessor GNS and two much older methods. We begin more than 50 years ago with Molecular Dynamics (<abbr title="Molecular Dynamics">MD</abbr>) and then move to the late 70s with Smoothed Particle Hydrodynamics (<abbr title="Smoothed Particle Hydrodynamics">SPH</abbr>), both of which can be considered among the first graph network simulators. At the end, we compare all these particle-based algorithms in a table.</p>

<h2 id="molecular-dynamics">Molecular Dynamics</h2>

<p><abbr title="Molecular Dynamics">MD</abbr> (see textbook by <cite><a href="https://www.sciencedirect.com/book/9780122673511/understanding-molecular-simulation">Frenkel, and Smit, 2002</a></cite>) is a widely used simulation method which generates the trajectory of an N-body atomic system. Indeed, there are many ways to implement MD, but we restrict our discussion to the simplest, unconstrained (no ensemble constraints) Hamiltonian mechanics description.</p>

<p>The first obvious similarity between MD and MeshGraphNets is the construction of the connections/edges: both can have a mesh as input and both compute the interactions based on spatial distances up to some fixed cut-off threshold $r_W$. Actually, this one is tricky because graph networks apply a sequence on $L$ iterative updates (aka GN layers), each of which reaches at most radius $r_W$ away, leading to a total maximum outreach of $L r_W$. Given that MeshGraphNets work with the same order of magnitude of world-graph neighbors inside a ball with radius $r_W$, but do 15 iterative updates on top of it (a ball with $15\times$ larger radius has a $3375$ larger volume), then graph network approaches practically interact with most of the other particles. This, among others, explains why the graph network paradigm is so successful.</p>

<p>In addition, both MD and GNS are translationally invariant w.r.t. the accelerations and permutation equivariant w.r.t. the particles.</p>

<p>As a further similarity, both MD and GNS spend most of their compute time on the computation of accelerations, as a way to integrate the time evolution of a Newtonian system (reminder: Newtonian system $F=ma$). However, there is a major difference in the time integrator scheme: MD mainly uses <a href="https://en.wikipedia.org/wiki/Symplectic_integrator">symplectic</a> integrators like <a href="https://en.wikipedia.org/wiki/Leapfrog_integration">Leapfrog</a>, whereas GNS simply uses Forward Euler. We note that in a recent work by <cite><a href="https://arxiv.org/abs/1909.12790">Sanchez-Gonzalez et al., 2019</a></cite>, it was shown that higher order integrators such as 4th order Runge-Kutta (RK4) have advantages like lower errors and better generalizability w.r.t. varying time steps. In this sense, an interesting future research idea would be comparing Forward Euler, RK4 and Leapfrog.</p>

<p>The main difference between MD and the GNS paper is that MD computes the accelerations by evaluating a predefined potential function, whereas GNS uses a learned GN model. The most difficult engineering part of MD is the design of the potential function. The potential often has a fairly complicated form because we need to manually incorporate all physical properties, i.e. for molecules we compute the potential for each bonding length, bonding angle, rotation angle, stretch-stretch cross term, bend-bend cross term, etc. as well as the Coulomb and Lennard-Jones potentials. That is why (equivariant) NNs are becoming so attractive for this task, see. e.g. <cite><a href="https://arxiv.org/abs/2101.03164">Batzner et al., 2021</a></cite>. Although GNS does not target molecular graphs, recently huge progress has been made with the graph networks paradigm on molecular property prediction (<cite><a href="https://arxiv.org/abs/2110.02905">Brandstetter et al., 2021</a></cite>, <cite><a href="https://arxiv.org/abs/2106.08903">Klicpera et al., 2021</a></cite>).</p>

<h2 id="smoothed-particle-hydrodynamics">Smoothed Particle Hydrodynamics</h2>

<p>A closer relative to GNS is the <abbr title="Smoothed Particle Hydrodynamics">SPH</abbr> algorithm (<cite><a href="https://ui.adsabs.harvard.edu/abs/1977AJ.....82.1013L">Lucy, 1977</a>; <a href="https://academic.oup.com/mnras/article/181/3/375/988212">Gingold, and Monaghan, 1977</a></cite>), which originates from astrophysics and is a well-known simulation technique in computer graphics (<cite><a href="https://people.inf.ethz.ch/~sobarbar/papers/Sol14/2014_EG_SPH_STAR.pdf">Ilmsen et al., 2014</a></cite>) and engineering, e.g. for multi-phase flows (<cite><a href="https://www.sciencedirect.com/science/article/pii/S0021999105004195">Hu, and Adams, 2006</a></cite>). Briefly, SPH discretizes the partial differential equations of fluid dynamics, namely the <a href="https://en.wikipedia.org/wiki/Navier%E2%80%93Stokes_equations">Navier-Stokes Equations</a> (<abbr title="Navier-Stokes Equations">NSE</abbr>), by truncated radial weighting kernels $W$, such that these discrete particles follow Newtonian mechanics with some prescribed force potential $\mathcal{P}$, very much resembling MD. For a short overview of SPH with many visualizations, see the slides of Matthias Teschner <a href="https://cg.informatik.uni-freiburg.de/course_notes/sim_10_sph.pdf">here</a>.</p>

<p>First of all, both SPH and GNS model materials obey the <a href="https://en.wikipedia.org/wiki/Continuum_mechanics#Concept_of_a_continuum">continuum assumption</a>, whereas MD operates on discrete particles, e.g. atoms. In addition to much smaller spatial scales, MD also operates on extremely short time intervals due to its computational intensity, which is a key difference to the other two approaches.</p>

<p>Again, all properties concerning invariance and equivariance in MD also apply to SPH. SPH simply operates on different space and time scales, but yet it solves predefined equations just like MD.</p>

<p><br />
<em>Now, as you more or less know how MD and SPH work, let’s compare them to GNS!</em></p>

<p><br /></p>

<h2 id="comparison">Comparison</h2>

<p>To illustrate the similarities and differences between</p>

<ol>
  <li>An MD-like approach with large pseudoparticles</li>
  <li>SPH</li>
  <li>The learned particle-based approach GNS during inference</li>
</ol>

<p>We provide the following table divided into encoder, processor and decoder sections. The table shows one update step, whose output is the simulated acceleration $\tilde{\ddot{X}}$. Finally, for simplicity, we assume that all particles have equal mass.</p>

<blockquote>
  <p>Disclaimer: MD, SPH and GNS are particle-based methods, which makes the comparison to MeshGraphNets unintuitive. We will address that issue in the next section. For now, we simply include MeshGraphNets in the table with all other methods.</p>
</blockquote>

<div style="overflow:scroll;">
<table width="100%">
  <thead>
    <tr><th width="25%">MD-like</th>
      <th width="25%">SPH</th>
      <th style="border-right: 2px solid black;" width="25%">GNS</th>
      <th width="25%">MeshGraphNets</th></tr>
  </thead>
  <tbody>
    <tr style="border-top:2px solid black"><th colspan="4" text-align="center">Inputs and neighborhood search</th></tr>
    <tr><td>$X^t$, $\dot{X}^t$, $M^t$, $\mathcal{P}$ (potential incl. mesh), $\mathbf{n}_{1:N^v}$</td>
      <td>$X^t$, $\dot{X}^t$, kernel $W$, EoS, $NSE$ (incl. viscosity), $\mathbf{n}_{1:N^v}$</td>
      <td style="border-right: 2px solid black;">$\{X^{t-h},...X^t\}$, $\mathbf{q}_{1:N^v}$, $\mathbf{n}_{1:N^v}$, $\epsilon^V, \epsilon^W$, $\text{GN}^{1:L}$, $\delta^V$</td>
      <td>$\{M^{t-h},...M^t\}$ with $M=\{\hat{V}, \hat{E}^M\}$ and $\hat{V}=\{U, X, \mathbf{q}_{1:N^v}, \mathbf{n}_{1:N^v} \}$, $\epsilon^V, \epsilon^M, \epsilon^W$, $\text{GN}^{1:L}$, $\delta^V$</td></tr>
    <tr><td> $L = \text{find_neighb}(X^t)$ <br />(~100+) </td>
      <td> $L = \text{find_neighb}(X^t)$ <br />(~30-40)</td>
      <td style="border-right: 2px solid black;"> $L = \text{find_neighb}(X^t)$ <br />(~10-20)</td>
      <td> $L = \text{find_neighb}(X^t)$ <br />(~0-20)</td></tr>    
    <tr style="border-top:2px solid black"><th colspan="4" text-align="center">Encoder</th></tr>
    <tr><td>$\tilde{\mathcal{P}} = \mathcal{P}(M^t, L, \mathbf{n}_{1:N^v})$</td>
      <td> $\rho_i=\sum_j W_{ij}(L)$  <br /> $p_i=EoS(\rho_i)$</td>
      <td style="border-right: 2px solid black;"> $V=\epsilon^V(X^{(t-h):t}, \mathbf{q}_{1:N^v}^t, \mathbf{n}_{1:N^v}^t)$, $E^W=\epsilon^W(X^t, L)$, $G^0=\{V, E^W\}$ </td>
      <td> $V=\epsilon^V(\hat{V}^{(t-h):t})$, $E^M=\epsilon^M(U,\hat{E}^M)$, $E^W=\epsilon^W(X^t, L)$, $G^0=\{V, E^M, E^W\}$</td></tr>
    <tr style="border-top:2px solid black"><th colspan="4" text-align="center">Processor</th></tr>
    <tr>
      <td>$\tilde{\ddot{\mathbf{x}}}_i = -\nabla \tilde{\mathcal{P}}(\mathbf{x}_i)$ </td>
      <td> $\tilde{\ddot{\mathbf{x}}}_i = NSE(\rho_i, p_i, \dot{\mathbf{x}}_i, \mathbf{n}_i)$ </td>
      <td style="border-right: 2px solid black;">$G^L=\text{GN}^L \circ ... \text{GN}^1 \circ G^0$</td>
      <td>$G^L=\text{GN}^L \circ ... \text{GN}^1 \circ G^0$</td></tr>
     <tr style="border-top:2px solid black"><th colspan="4" text-align="center">Decoder</th></tr>
    <tr><td> </td>
      <td> </td>
      <td style="border-right: 2px solid black;">$\tilde{\ddot{\mathbf{x}}}_i=\mathbf{y}_i=\delta^V(V^L)$</td>
      <td>$\tilde{\ddot{\mathbf{x}}}_i = \tilde{\mathbf{p}_i^{t+1}} = \delta^V(V^L) $</td></tr>
  </tbody>
</table>
</div>
<p>We denote the equation of state with EoS and the Navier-Stokes Equations with NSE.</p>

<p>The first thing we see in the table are the inputs, which can be divided into 3 groups: 1) coordinates and derivatives thereof, 2) geometric relations, and 3) physical parameters. Contrary to MD and SPH, which explicitly require the velocity as input, the learned simulators approximate the velocity by taking a history of the coordinates. The geometric relations, excluding the neighbors lists, are included for MD in the potential. For SPH and GNS there is no direct way to include them, and for MeshGrahpNets the geometric relations are included in the mesh. And lastly, contrary to MD and SPH with explicit physical laws, the learned methods have to learn all governing parameters, like viscosity and gravity, from the data.</p>

<p>Having made these observations, it seems like SPH is the least useful method because it requires physical knowledge and cannot include meshes. But if we look at the neighbors search for MD in the table, we see that much more particles are included (larger interaction radius) to properly estimate the forces.</p>

<p>Looking at the main part of the table, i.e. Encoder-Processor-Decoder, we clearly see that the job of the Graph Network is to approximate the gradient of the potential $\mathcal{P}$ in a latent space described by the encoder and decoder. As a vague comparison between GNS and MD, one could argue that constructing the specific potential $\tilde{\mathcal{P}}$ corresponds to encoding the geometric information $L$ similar to the construction of $G^0$.</p>

<hr />

<h1 id="from-gns-to-meshgraphnets">From GNS to MeshGraphNets</h1>

<h2 id="meshes">Meshes</h2>

<p>Previously, the insertion of a mesh through the means of MD-like particles with a modified potential function was explained. In pure SPH this is not possible, which is why one would tend to use a finite element solver for such problems. Conventional thin surface finite element simulators, like <a href="http://graphics.berkeley.edu/resources/ARCSim/">ArcSim</a>, contain a lot of hard-coded engineering knowledge, e.g. complicated elasto-static material laws, and are very well-suited for such problems.</p>

<p>Fortunately, from the point of view of learned GN-based simulators, it does not matter, whether the graph $G^0$ depends on a predefined mesh $M$, or whether we construct the interaction graph exclusively based on Euclidean distances. However, the authors of the MeshGraphNets paper found out that superimposing the solutions computed on both edge sets $E^M$ and $E^W$ separately is more powerful than training the model to implicitly distinguish the origin of the interaction. According to the <a href="https://github.com/deepmind/deepmind-research/tree/master/meshgraphnets">reference implementation</a> on GitHub, the authors train only one Graph Networks set of 15 layers on both mesh-edges and world-edges, but they do not let both edge sets interact during one rollout step. Thus, the only way that one edge set can influence the other is in the consecutive time step, after the jointly updated nodes have gathered information from both edge sets.</p>

<p>Making this small step in terms of GNs implementation complexity and including further edge sets, might seem like no big deal. However, this opens many new doors for engineering applications: basically all conventional particle-based and mesh-based simulations can be approached in this manner!</p>

<p>In the previous section on MD, SPH and GNS, we mainly discussed particle-based methods. In keeping with the general theme of making connections between traditional fluid mechanics and learned simulators, we now claim that <strong>meshes and particles are essentially the same thing</strong>.</p>

<h2 id="fluid-particle-model">Fluid-particle model</h2>

<p>We go back to the 1990s and the Fluid Particle Model (<abbr title="Fluid Particle Model">FPM</abbr>) by <cite><a href="Espa&ntilde;ol">Español, 1998</a></cite>, which is a mesoscopic Newtonian fluid model, as opposed to the microscopic MD and the macroscopic SPH. In this work, the concept of “fluid particle” is analyzed from the point of view of a <a href="https://en.wikipedia.org/wiki/Voronoi_diagram">Voronoi tessellation</a> of a molecular fluid. The model claims to be a generalization of Dissipative Particle Dynamics (<abbr title="Dissipative Particle Dynamics">DPD</abbr>) by <cite><a href="https://iopscience.iop.org/article/10.1209/0295-5075/19/3/001">Hoogerbrugge, and Koelman, 1992</a></cite> and SPH.</p>

<figure>
  <img src="https://iclr.iro.umontreal.ca/c1230751-f982-4c0c-9498-f003067e2cff_1642247537/public/images/2021-12-01-meshgraphnets/Voronoi.png" alt="Delyanou and Voronoi" style="display:block;margin-left:auto;margin-right:auto;width:80%;" />
  <figcaption style="text-align: center;opacity:70%;font-style: italic;">Fig.8: Single points (left), Delaunay triangulation (middle) and Voronoi diagram (right) (Image source: <cite><a href="https://www.researchgate.net/publication/311521487_Voronoi_diagrams_-_architectural_and_structural_rod_structure_research_model_optimization">Rokicki, and Gawell, 2016</a></cite>).</figcaption>
</figure>

<p><cite><a href="Espa&ntilde;ol">Español, 1998</a></cite> suggests to use Voronoi tessellation as a way of coarse-graining atomistic systems to pseudoparticle systems, where each pseudoparticle is an ensemble of atoms in thermal equilibrium. Starting with a set of points spread over the domain of interest, Voronoi tessellation assigns to each point (later pseudoparticle location) the region of space that is closer to this point than to any other point. In this way, physical space is divided into non-overlapping cells that cover the full domain. Coming from MD, SPH and GNS, the cell centers correspond to the location of the simulated particles. Before we build the connection to MeshGraphNets, we note that there are at least two important meshes related to the Voronoi diagram: 1) the mesh defined by the Voronoi edges and vortices, and 2) the unique triangular mesh obtained by <a href="https://en.wikipedia.org/wiki/Delaunay_triangulation#Relationship_with_the_Voronoi_diagram">Delaunay triangulation</a>. This second mesh has a 1-to-1 correspondance to the triangular mesh used in the flag example of the MeshGraphNets paper, where each mesh node also corresponds to the cell center of a simulated pseudoparticle.</p>

<h2 id="time-extrapolation-and-training-noise">Time extrapolation and training noise</h2>

<p>Here, we argue that the invaluable training noise closely relates GNS and MeshGraphNets to mesoscopic fluid dynamics, e.g. DPD.</p>

<p>The GNS paper faces problems with long rollouts - the longest rollout increase is demonstrated on the <a href="https://sites.google.com/view/learning-to-simulate/home#h.p_oVm-SKnnQ_p9">Ramps-Large example</a> with ~8.5x extrapolation and in a <a href="https://sites.google.com/view/learning-to-simulate/home#h.p_GYmSoIisBUUX">further example</a> it is observed that solids may become deformed over long rollouts. Nevertheless, 8.5x extrapolation is impressive, and the authors claim that it is achieved through using training noise. In the MeshGraphNets paper truly remarkable results were demonstrated with rollouts 100x longer than the training sequence on the <a href="https://sites.google.com/view/meshgraphnets#h.qrzo5h22wpnj">flag experiment</a>. And again the authors explain the success through training noise.</p>

<p>Following <cite><a href="Espa&ntilde;ol">Español, 1998</a></cite>, both DPD and SPH operate on pseudoparticles and the underlying assumption of these approaches is that:</p>

<ol>
  <li>The ensemble of molecules included in a pseudoparticle is large enough to be considered as a thermodynamic system.</li>
  <li>The variation among neighboring Voronoi cells is small.</li>
</ol>

<p>In addition, in DPD the ensembles are not too big, such that Brownian motion has to be considered, whereas in SPH the ensembles are so big that all atomistic thermal agitations cancel out. Thus, by imposing the right amount of Gaussian noise (describing Brownian motion), a system is approximated, in which molecular randomness plays a role. This is essentially what injecting noise on the inputs results in.</p>

<p>Clearly, GNS and MeshGraphNets do not operate on such small scales, and yet injecting noise seems to be a useful tool. One explanation could be that by training the model to predict the true output despite noisy input, the model converges to the central limit of the estimated conditional distribution of the acceleration, as was previously similarly observed by <cite><a href="https://arxiv.org/abs/1904.05158">Yeo, 2019</a></cite> for recurrent neural networks.</p>

<h2 id="remeshing">Remeshing</h2>

<p>In addition to introducing meshes, another important contribution of the MeshGraphNets paper is adaptive remeshing. In a nutshell, some of the output features of the model are devoted to learning the curvature, avoiding complicated conventional estimates of that quantity. Thus, when the curvature exceeds a threshold, we can simply apply a low-cost conventional remeshing algorithm in the middle of the pipeline. Remeshing is what allows MeshGraphNets to simulate the complex flag dynamics seen in the <a href="https://sites.google.com/view/meshgraphnets">demos</a>.</p>

<p>Considering the second assumption by <cite><a href="Espa&ntilde;ol">Español, 1998</a></cite> from the evaluated references from the previous subsection, it is desired to have Voronoi cells, such that their size is inversely proportional to variations in their properties, so as to the variation is kept small. Thus, a natural approach would be to sample more discretization points in regions with high property variation. One possible quantitative measure of these variations could be the fluid relaxation (<cite><a href="https://www.sciencedirect.com/science/article/abs/pii/S0045782519301227">Fu et al., 2019</a></cite>). 
The very same line of reasoning is employed in the mesh refinement of the MeshGraphNets algorithm</p>

<p><br />
<em>Before you get too excited about MeshGraphNets, we will look at some alternatives.</em></p>

<hr />

<h1 id="comparison-to-related-work">Comparison to Related Work</h1>

<p>There exist a number of approaches to the prediction of future states of physical systems that implicitly approximate the forward operator through an embedding into a latent space as pioneered by <cite><a href="https://www.science.org/doi/10.1126/science.1127647">Hinton, and Salakhutdinov, 2006</a></cite> and later extended to the variational paradigm by <cite><a href="https://arxiv.org/abs/1312.6114">Kingma, and Welling, 2013</a></cite>. Another approach is to utilize graph networks at a higher level to achieve such approximation. While the MeshGraphNets framework is very general, there are scenarios, such as when the future state of only a few chosen observables is desired, where other approaches would lead to better results. In the following section, we give a brief overview and comparison of these related works, which all lift the physical system into a chosen latent space, to then propagate the system forward on this latent space.</p>

<h2 id="graph-network-based">Graph Network-based</h2>

<p>The currently used graph-based approaches for fluid simulations can be categorized along the following characteristics:</p>

<ol>
  <li><em>How much physics is contained in the solver?</em>
    <ul>
      <li>Augmenting classical solvers</li>
      <li>Containing some physics</li>
      <li>Purely learned</li>
    </ul>
  </li>
  <li><em>How many time updates does the approach utilize?</em>
    <ul>
      <li>Single time step update</li>
      <li>Direct time methods</li>
    </ul>
  </li>
</ol>

<p>The trade-off among these methods is between generality and accuracy. Having a purely learned method as general as MeshGraphNets imposes high computational costs at training time due to large model size. For more complex fluid flow problems, one has to deal with the additional cost of generating an adequate dataset, which depends on the simulator and the specific problem configuration and can take months (<cite><a href="https://avestia.com/ICSTA2021_Proceedings/files/paper/ICSTA_127.pdf">Dalton et al., 2021</a></cite>).</p>

<p>A very prominent idea, and good comparison to MeshGraphNets, which augments a conventional differentiable solver, is presented by <cite><a href="http://proceedings.mlr.press/v119/de-avila-belbute-peres20a.html">Belbute-Peres et al., 2020</a></cite>. The authors combine the conventional <a href="https://su2code.github.io/">SU2</a> solver on a coarse grid with a graph network to achieve a significant reduction in computational costs. However, this model’s performance lacks behind MeshGraphNets on more complex benchmarks. This is quite possibly a result of the lack of relative encodings and the flexibility of the message passing steps of the processor of MeshGraphNets. In addition, such approaches are highly dependent on the stability of their gradients, which have been shown to be instable for fluid flows (<cite><a href="https://www.sciencedirect.com/science/article/pii/S0021999112005360">Wang, 2012</a></cite>) before, and which has recently resurfaced in the context of applying gradient-based methods to general dynamical systems, such as MD (<cite><a href="https://arxiv.org/abs/2111.05803">Metz et al., 2021</a></cite>).</p>

<p>Some followup works to MeshGraphNets extending specific aspects of the algorithm include:</p>
<ul>
  <li><cite><a href="https://hal.archives-ouvertes.fr/hal-03432662/document">Chen et al., 2021</a></cite> solves 2D laminar flow around complicated shapes</li>
  <li><cite><a href="https://arxiv.org/pdf/2112.09161.pdf">Rubanova et al., 2021</a></cite> solves a learned constraint optimization problem improving MeshGraphNets on tasks involving strong interactions</li>
  <li><cite><a href="https://proceedings.neurips.cc/paper/2021/hash/0cddb7c06f1cd518e1efdc0e20b70c31-Abstract.html">Xu et al., 2021</a></cite> incorporates the idea of conditional parametrization into the GNN to better capture the relationships between physical quantities.</li>
</ul>

<p>So far only “single time step update” methods were shown. The alternative approach to problems with steady state solutions is called the “direct time method”, in which one directly predicts the final solution instead of performing an iterative rollout. Such approaches are presented by <cite><a href="https://arxiv.org/abs/2105.02575">Harsch, and Riedelbauch, 2021</a></cite> and <cite><a href="https://arxiv.org/pdf/2112.10296.pdf">Meyer et al., 2021</a></cite>.</p>

<h2 id="comparable-paradigms">Comparable Paradigms</h2>

<p>When we reduce the concept of the MeshGraphNet to the core of its architecture, which is the $\text{Encoder} \rightarrow \text{Processor} \rightarrow \text{Decoder}$ architecture, we obtain a set of analogies to related architectures. These architectures are based on different principles, but follow the same approach of lifting the dynamics onto a latent space, where they are then able to advance the system in time with a learned forward-operator. This is in stark contrast to the Neural ODEs of <cite><a href="https://dl.acm.org/doi/abs/10.5555/3327757.3327764">Chen et al., 2018</a></cite>, which utilize a recurrent neural network as encoder, but employ a fixed ODE-integrator on the latent space. The Hamiltonian Generative Networks of <cite><a href="https://openreview.net/forum?id=HJenn6VFvB">Toth et al., 2020</a></cite> utilizes Hamiltonian dynamics by learning an embedding onto a Hamiltonian space, on which it then applies the learned Hamiltonian operator to advance the system in time. In contrast to this, Koopman Theory (see <cite><a href="https://arxiv.org/abs/2102.12086">Brunton et al., 2021</a></cite> for a review) seeks to find an embedding on a finite-dimensional coordinate system, in which the Koopman eigenfunctions provide intrinsic coordinates to globally linearize the dynamics of our system as demonstrated by <cite><a href="https://www.nature.com/articles/s41467-018-07210-0">Lusch et al., 2018</a></cite>. This leads to a linear dynamical system on the latent space, where the Koopman operator can then be iteratively applied to propagate the system in time. The cornerstones of these two approaches can be compared with MeshGraphNets in the following way:</p>

<div style="overflow:scroll;">
<table width="100%">
  <thead>
    <tr>
      <th width="33%">Hamiltonian Generative Networks</th>
      <th style="border-right: 2px solid black;" width="33%">Koopman Embeddings</th>
      <th width="33%">MeshGraphNets</th></tr>
  </thead>
  <tbody>
    <tr style="border-top:2px solid black"><th colspan="3" text-align="center">Latent Space</th></tr>
    <tr>
      <td>Hamiltonian Space</td>
      <td style="border-right: 2px solid black;">Lifted to finite-dimensional coordinate system, in which Koopman eigenfunctions globally linearize dynamics</td>
      <td>Lifted to high-dimensional space</td></tr>
    <tr style="border-top:2px solid black"><th colspan="3" text-align="center">Time-Stepping</th></tr>
    <tr>
      <td> Iterative application of Hamiltonian operator</td>
      <td style="border-right: 2px solid black;"> Iterative application of Koopman operator </td>
      <td> Iterative application of Graph Network blocks</td></tr>
    <tr style="border-top:2px solid black"><th colspan="3" text-align="center">Modelling</th></tr>
    <tr>
      <td>Full state-space </td>
      <td style="border-right: 2px solid black;">Space of Observables</td>
      <td>Full state-space</td></tr>
  </tbody>
</table>
</div>

<p><br /></p>

<p>What sets MeshGraphNets apart from these two paradigms is its independence from a latent space which has to fulfill a set of properties, as is the case for Hamiltonian Generative Networks, and Koopman Embeddings. Especially for Koopman embeddings, the question of finding the right coordinate system remains an open and unsolved challenge, making MeshGraphNets the more general framework for the time being.</p>

<hr />

<h1 id="summary-and-practical-considerations">Summary and Practical Considerations</h1>

<p>Here, we briefly list the strengths and weaknesses of the MeshGraphNets framework, and pose some open questions for further investigation.</p>

<p><strong>Strengths</strong></p>
<ul>
  <li>Unification of particle-based and mesh-based methods.</li>
  <li>Time extrapolation to long rollouts demonstrated on the <a href="https://sites.google.com/view/meshgraphnets#h.qrzo5h22wpnj">flag experiment</a>.</li>
  <li>Remeshing is easily integrated in the framework, making it much more useful for real world problems.</li>
  <li>Learning local interactions allows extrapolating to theoretically infinitely large domains and number of particles.</li>
</ul>

<p><strong>Weaknesses</strong></p>
<ul>
  <li>Training noise seems to be indispensable, but there is no clear strategy how to find its optimal amount. In addition to that, the authors introduce a second hyperparameter $\gamma$ for the correction of the noise effects, resulting in 2 new hyperparameters in total.</li>
  <li>Long training times and many model parameters. The base model used in the <code class="language-plaintext highlighter-rouge">flag_simple</code> example has around 2.5M parameters and training the model for the suggested 10M iterations takes 5 days on an accessible GPU like Nvidia RTX 2070 and 2 days on the newest Nvidia A6000.</li>
</ul>

<p><strong>Open Questions</strong></p>
<ul>
  <li>Scaling MeshGraphNets (so far applied to 5k nodes max) to industrial applications with millions of nodes would require parallelizing, which is yet to be explored.</li>
  <li>Understanding the training noise is another important future direction.</li>
  <li>Training data is limited in many domains and it would be interesting to measure how much the performance depends on the amount of training data.</li>
  <li>The interpretability of the model and its components is currently limited. In this post, we have analyzed the model from the perspective of fluid mechanics, but there is much more work to be done.</li>
  <li>Extending to complex fluids or even multiphase dynamics would be yet another interesting future direction.</li>
</ul>

<p><strong>Code</strong></p>

<p>For some open source implementations, visit the related <a href="https://paperswithcode.com/paper/learning-mesh-based-simulation-with-graph-1">Papers With Code</a> page. There you will find the official repository by DeepMind using Tensorflow v1, and you should also see at least one PyTorch implementation. At the time of writing this blog, there is no JAX implementation listed there.</p>

<p>We partially fill this gap with the code snippet below. The snippet demonstrates how to implement a fully functional MeshGraphNets model in JAX with just 50 lines of code. This compact implementation should encourage people with less experience in the field to dive deeper and apply the model to their applications.</p>

<div>
<style>
 .blob-code, .blob-num {      
       font-size: 14px !important;
    } 
</style>

<script src="https://gist.github.com/anonymous0584468487/76b133510253281249001ec350b0f614.js"></script>
</div>

<h1 id="references">References</h1>

<ol>
  <li>Pfaff, Fortunato, Sanchez-Gonzalez, and Battaglia. <a href="https://openreview.net/forum?id=roNqYL0_XP">“Learning Mesh-Based Simulation with Graph Networks.”</a> International Conference on Learning Representations, 2021.</li>
  <li>Battaglia, Hamrick, Baphst, Sanchez-Gonzalez, Zambaldi, Malinowski, Tacchetti, Raposo, Santoro, Faulkner, Gulcehre, Song, Ballard, Gilmer, Dahl, Vaswani, Allen, Nash, Langston, Dyer, Heess, Wierstra, Kohli, Botvinick, Vinyals, Li, and Pascanu. <a href="https://arxiv.org/abs/1806.01261">“Relational inductive biases, deep learning, and graph networks.”</a> arXiv preprint arXiv:1806.01261, 2018.</li>
  <li>Scarselli, Gori, Chung Tsoi, Hagenbuchner, and Monfardini. <a href="https://ieeexplore.ieee.org/document/4700287">“The graph neural network model.”</a> IEEE Transactions on Neural Networks 20.1: 61-80, 2008.</li>
  <li>Bronstein, Bruna, Cohen, and Velickovic. <a href="https://arxiv.org/abs/2104.13478">Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.</a> arXiv preprint arXiv:2104.13478, 2021.</li>
  <li>Cohen. <a href="https://dare.uva.nl/search?identifier=0f7014ae-ee94-430e-a5d8-37d03d8d10e6">Equivariant Convolutional Networks.</a> PhD Thesis University of Amsterdam, 2021.</li>
  <li>Sanchez-Gonzalez, Godwin, Pfaff, Ying, Leskovec, and Battaglia. <a href="http://proceedings.mlr.press/v119/sanchez-gonzalez20a.html">“Learning to Simulate Complex Physics with Graph Networks.”</a> International Conference on Machine Learning. PLMR, 2020.</li>
  <li>Battaglia, Rascanu, Lai, Rezende, and Kavukcuoglu. <a href="https://proceedings.neurips.cc/paper/2016/hash/9657c1fffd38824e5ab0472e022e577e-Abstract.html">“Interaction Networks for Learning about Objects, Relations and Physics.”</a> Advances in Neural Information Processing Systems, 2016.</li>
  <li>Frenkel, and Smit. <a href="https://www.sciencedirect.com/book/9780122673511/understanding-molecular-simulation">“Understanding Molecular Simulation.”</a> Elsevier, 2002.</li>
  <li>Sanchez-Gonzalez, Bapst, Cranmer, and Battaglia. <a href="https://arxiv.org/abs/1909.12790">“Hamiltonian Graph Networks with ODE Integrators.”</a> arXiv preprint arXiv:1909.12790, 2019.</li>
  <li>Batzner, Musaelian, Sun, Geiger, Mailoa, Kornbluth, Molinari, Smidt, and Kozinsky. <a href="https://arxiv.org/abs/2101.03164">“E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials.”</a> arXiv preprint arXiv:2101.03164, 2021.</li>
  <li>Brandstetter, Hesselink, van der Pol, Bekkers, and Welling. <a href="https://arxiv.org/abs/2110.02905">“Geometric and Physical Quantities Improve E(3) Equivariant Message Passing.”</a> arXiv preprint arXiv:2110.02905, 2021.</li>
  <li>Klicpera, Becker, and Günnemann. <a href="https://arxiv.org/abs/2106.08903">“GemNet: Universal Directional Graph Neural Networks for Molecules.”</a> arXiv preprint arXiv:2106.08903, 2021.</li>
  <li>Lucy. <a href="https://ui.adsabs.harvard.edu/abs/1977AJ.....82.1013L">“A numerical approach to the testing of the fission hypothesis.”</a> The Astronomical Journal 82: 1013-1024, 1977.</li>
  <li>Gingold, and Monaghan. <a href="https://academic.oup.com/mnras/article/181/3/375/988212">“Smoothed particle hydrodynamics: theory and application to non-spherical stars.”</a> Monthly Notices of the Royal Astronomical Society 181.3: 375-389, 1977.</li>
  <li>Ihmsen, Orthmann, Solenthaler, Kolb, and Teschner. <a href="https://people.inf.ethz.ch/~sobarbar/papers/Sol14/2014_EG_SPH_STAR.pdf">“SPH Fluids in Computer Graphics”</a> Eurographics, 2014.</li>
  <li>Hu, and Adams. <a href="https://www.sciencedirect.com/science/article/pii/S0021999105004195">“A multi-phase SPH method for macroscopic and mesoscopic flows.”</a> Journal of Computational Physics 213.2: 844-861, 2006.</li>
  <li>Español. <a href="https://link.aps.org/doi/10.1103/PhysRevE.57.2930">“A Fluid Particle Model.”</a> Physical Review E 57.3: 2930, 1998.</li>
  <li>Hoogerbrugge, and Koelman. <a href="https://iopscience.iop.org/article/10.1209/0295-5075/19/3/001">“Simulating Microscopic Hydrodynamic Phenomena with Dissipative Particle Dynamics.”</a> Europhysics Letters 19.3: 155, 1992.</li>
  <li>Rokicki, and Gawell. <a href="https://www.researchgate.net/publication/311521487_Voronoi_diagrams_-_architectural_and_structural_rod_structure_research_model_optimization">“Voronoi diagrams – architectural and structural rod structure research model optimization.”</a> MAZOWSZE Studia Regionalne, 2016.</li>
  <li>Yeo. <a href="https://arxiv.org/abs/1904.051580">“Short note on the behavior of recurrent neural network for dynamical system”</a> arXiv preprint arXiv:1904.05158, 2019.</li>
  <li>Fu, Han, Hu, and Adams. <a href="https://www.sciencedirect.com/science/article/abs/pii/S0045782519301227">“An isotropic unstructured mesh generation method based on a fluid relaxation analogy.”</a> Computer Methods in Applied Mechanics and Engineering 350: 396-431, 2019.</li>
  <li>Hinton, and Salakhutdinov. <a href="https://www.science.org/doi/abs/10.1126/science.1127647">“Reducing the Dimensionality of Data with Neural Networks.”</a> science 313.5786: 504-507, 2006.</li>
  <li>Kingma, and Welling. <a href="https://arxiv.org/abs/1312.6114">“Auto-Encoding Variational Bayes.”</a> stat 1050: 1, 2014.</li>
  <li>Dalton, Lazarus, Rabbani, Gao, and Husmeier. <a href="https://avestia.com/ICSTA2021_Proceedings/files/paper/ICSTA_127.pdf">“Graph Neural Network Emulation of Cardiac Mechanics.”</a> International Conference on Statistics: Theory and Applications, 2021.</li>
  <li>Belbute-Peres, Economon, and Kolter. <a href="http://proceedings.mlr.press/v119/de-avila-belbute-peres20a.html">“Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow Prediction.”</a> International Conference on Machine Learning. PMLR, 2020.</li>
  <li>Wang. <a href="https://www.sciencedirect.com/science/article/pii/S0021999112005360">“Forward and Adjoint Sensitivity Computation of Chaotic Dynamical Systems.”</a> Journal of Computational Physics 235: 1-13, 2013.</li>
  <li>Metz, Freeman, Schoenholz, and Kachman. <a href="https://arxiv.org/abs/2111.05803">“Gradients are Not All You Need.”</a> arXiv preprint arXiv:2111.05803, 2021.</li>
  <li>Chen, Hachem, and Viquerat. <a href="https://aip.scitation.org/doi/abs/10.1063/5.0064108">“Graph neural networks for laminar flow prediction around random two-dimensional shapes.”</a> Physics of Fluids 33.12: 123607, 2021.</li>
  <li>Rubanova, Sanchez-Gonzalez, Pfaff, and Battaglia. <a href="https://arxiv.org/pdf/2112.09161.pdf">“Constraint-based graph network simulator.”</a> arXiv preprint arXiv:2112.09161, 2021.</li>
  <li>Xu, Pradhan, and Duraisamy. <a href="https://proceedings.neurips.cc/paper/2021/hash/0cddb7c06f1cd518e1efdc0e20b70c31-Abstract.html">“Conditionally-Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems.”</a> Advances in Neural Information Processing Systems 34, 2021.</li>
  <li>Harsch, and Riedelbauch. <a href="https://arxiv.org/abs/2105.02575">“Direct Prediction of Steady-State Flow Fields in Meshed Domain with Graph Networks.”</a> arXiv preprint arXiv:2105.02575, 2021.</li>
  <li>Meyer, Pottier, Ribes, and Raffin. <a href="https://arxiv.org/pdf/2112.10296.pdf">“Deep Surrogate for Direct Time Fluid Dynamics.”</a> arXiv preprint arXiv:2109.09510, 2021.</li>
  <li>Chen, Rubanova, Bettencourt, and Duvenaud. <a href="https://dl.acm.org/doi/abs/10.5555/3327757.3327764">“Neural Ordinary Differential Equations.”</a> Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.</li>
  <li>Toth, Rezende, Jaegle, Racaniere, Botev, and Higgins. <a href="https://openreview.net/forum?id=HJenn6VFvB">“Hamiltonian Generative Networks.”</a> International Conference on Learning Representations, 2020.</li>
  <li>Brunton, Busidic, Kaiser, and Kutz. <a href="https://arxiv.org/abs/2102.12086">“Modern Koopman Theory for Dynamical Systems.”</a> arXiv preprint arXiv:2102.12086, 2021.</li>
  <li>Lusch, Kutz, and Brunton. <a href="https://www.nature.com/articles/s41467-018-07210-0">“Deep learning for universal linear embeddings of nonlinear dynamics.”</a> Nature Communications 9.1: 1-10, 2019.</li>
</ol>


</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#gnn"> GNN </a>
  
    <a class="content-tag" href="/tags/#graph-network"> Graph Network </a>
  
    <a class="content-tag" href="/tags/#mesh-based-simulations"> Mesh-based simulations </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#gnn"> GNN </a>
  
    <a class="content-tag" href="/tags/#graph-network"> Graph Network </a>
  
    <a class="content-tag" href="/tags/#mesh-based-simulations"> Mesh-based simulations </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
