From Molecular Dynamics to MeshGraphNets

Blog post on the paper “Learning Mesh-Based Simulation with Graph Networks” aka MeshGraphNets (Pfaff et al., 2021).

Reference paper, website and github.

Timeline Banner
Historical timeline of methods leading up to MeshGraphNets.

Numerical solvers that model complex physical systems, such as fluids, elastic materials, electromagnetic fields, etc., are based on the discretization of the underlying partial differential equations (PDEs). They use methods such as Finite Elements, Finite Volumes, Finite Differences, etc. which are computationally expensive and require complete knowledge of the governing PDE and boundary conditions. In some cases, free parameters may be estimated from experimental data, but this requires solving ill-posed inverse problems which requires even more computation.

Over the last few years, Deep Learning has emerged as an alternative to classical methods, providing faster prediction and allowing for quick exploration of large design spaces. Deep Learning is especially well suited for problems, in which the physics (PDEs and boundary conditions) is only partially known, and must be integrated with sparse measurement data. However, achieving the same high accuracy of classical methods has proven to be challenging.

A core reason for this gap in accuracy is the emphasis classical methods, such as Finite Elements, place on their mesh discretization. Classical methods use dynamic mesh adaptation as the basis for both a priori and a posteriori error control algorithms, that guarantee accurate simulation results. Despite their importance in computational mechanics, nonuniform or unstructured meshes have not received much attention in Machine Learning for Physics, until DeepMind proposed the MeshGraphNets framework. This framework uses Graph Networks to simulate dynamical systems with adaptive mesh representations. It learns an input-output map over the initial mesh, which can then be remeshed adaptively during rollout, to increase the mesh resolution in areas with large residuals.

In this blog, we discuss the MeshGraphNets paper and its predecessor paper through the lens of the graph-learning paradigm. We claim that molecular dynamics and smoothed particle hydrodynamics are the ancestors of all graph-based, learned particle simulators and show how graph-based approaches naturally extend to meshes. Then, we compare MeshGraphNets to other approaches, both graph-based and not. Last but not least, we conclude by presenting the strengths and weaknesses of the model, directions for future work, and a code snippet of the core algorithm written in JAX.


Table of Contents


Notation

Our nomenclature follows the MeshGraphNets paper, with some minor modifications and extensions. We underline the distinction between mesh and graph attributes (not to be confused with mesh-space and world-space!). In the paper, mesh nodes and edges have the same notation as graph nodes and edges. We believe this abuse of notation might be confusing to some readers and thus put a hat ($\hat{\square}$) above the mesh nodes and edges.

Symbol Meaning
Mesh
$M^t = (\hat{V}, \hat{E}^M)$ Mesh at time $t$.
$\hat{\mathbf{v}}_i \in \hat{V}$ Set of graph nodes. Contains:
1) Lagrangian: $\{\mathbf{u}_{i}, \mathbf{x}_{i}, \mathbf{q}_{i}, \mathbf{n}_{i} \}$.
2) Eulerian: $\{\mathbf{u}_{i}, \mathbf{q}_{i}, \mathbf{n}_i\}$.
$\hat{\mathbf{e}}_{ij}^M \in \hat{E}^M$ Mesh edges representing the connections, i.e. adjacency matrix.
$\mathbf{u}_i \in U, \mathbf{u}_{ij}, \left| \mathbf{u}_{ij} \right|$ Mesh-space coordinate, displacement $\mathbf{u}_{ij}=\mathbf{u}_{i}-\mathbf{u}_{j}$, and its norm.
$\mathbf{x}_i \in X, \mathbf{x}_{ij}, \left| \mathbf{x}_{ij} \right|$ World-space coordinate, displacement $\mathbf{x}_{ij}=\mathbf{x}_{i}-\mathbf{x}_{j}$, and its norm.
$\mathbf{q}_i$ Dynamical features, e.g. velocity in Eulerian systems, momentum, density.
$\mathbf{n}_i$ Node type as one-hot encoding, e.g. for boundary conditions.
$\tilde{\square}$ Simulated quantity including errors.
Graph
$G=(V, E^M, E^W)$ Multigraph. $E^W$ only in Lagrangian systems.
$\mathbf{v}_i \in V$ Graph node. Contains MLP encoding $\epsilon^V(\{\mathbf{q}_i, \mathbf{n}_i \})$. In the cloth experiment $(\mathbf{x}^t_i-\mathbf{x}^{t-1}_i)$ is included as well.
$\mathbf{e}_{ij} \in E$ General graph edge.
$\mathbf{e}_{ij}^M \in E^M$ Mesh edge modeling internal dynamics. Contains MLP encoding:
1) Lagrangian: $\epsilon^M(\{\mathbf{u}_{ij}, \left|\mathbf{u}_{ij}\right|, \mathbf{x}_{ij}, \left|\mathbf{x}_{ij}\right| \})$.
2) Eulerian: $\epsilon^M(\{\mathbf{u}_{ij}, \left|\mathbf{u}_{ij}\right|\})$.
$\mathbf{e}_{ij}^W \in E^W$ World edges modeling external dynamics, e.g. collision, contact. Contains an MLP encoding $\epsilon^W(\{ \mathbf{x}_{ij}, \left|\mathbf{x}_{ij}\right| \})$. Applicable only if:
1) Lagrangian system.
2) $\left|\mathbf{x}_{ij}\right| < r_W$ with interaction radius $r_W$.
3) Excluding nodes already connected in the mesh.
$\epsilon^V, \epsilon^M, \epsilon^W$ MLPs used to encode $\mathbf{v}_i, \mathbf{e}_{ij}^M$ and $\mathbf{e}_{ij}^W$. Output size 128.
$f^V, f^E, f^M, f^W$ MLPs used to update nodes, general edges, mesh edges and world edges in each GN layer.
$\delta^V$ MLP used as decoder.
$\mathbf{p}_i$ Output feature vector. Contains one or more of:
1) Spatial derivative, e.g. velocity, acceleration.
2) Derivates of $\mathbf{q}_i$.
2) Some additional learned quantity, e.g. pressure, stress, sizing field.
$\mathbf{y}_i \in Y$ Output feature vector containing spatial derivatives (special case of $\mathbf{p}_i$).
$r_k, s_k$ Receiver and sender w.r.t. node $k$.

Basics: Graph Networks

Graph Networks (GNs) (Battaglia et al., 2018) is a framework that generalizes graph based learning, and specifically the Graph Neural Network (GNN) architecture by Scarselli et al., 2008. It defines a class of functions that take a graph as input, perform computations such as convolution, and again output a graph.

In the original GN paper, the input was defined as a 3-tuple, with one of the 3 elements being a global feature. In MeshGraphNets and its predecessor, the authors made an algorithmic simplification by concatenating this global feature to the node features, resulting in the following 2-tuple:

\[\begin{align*} G = (V,E) \end{align*}\]

$V$ is the set of node features which describe usual node features and the aforementioned global feature. Usual node features in the case of particle dynamics could be: position, velocity, mass, etc. The global feature could be anything from the potential energy of the system, through material properties, such as density or viscosity, to the gravitational constant.

\[\begin{align*} V = \{\mathbf{v}_{i}\}_{i=1:N^v} \end{align*}\]

Finally, $E$ is the set of edge features between the connected nodes. If we denote the edge features by $\mathbf{e}_k$ and the index of the receiver and sender nodes by $r_k$ and $s_k$, we get:

\[\begin{align*} E = \{ (\mathbf{e}_{k}, r_{k},s_{k}) \}_{k=1:N^e}. \end{align*}\]

Algorithm

We want to briefly discuss the GNs algorithm as used in MeshGraphNets, i.e. without explicit global features.

Graph Network Block
Fig.1: Graph Network block pseudocode (Adapted from: Battaglia et al., 2018).

The main computations are the update functions $f$ and the aggregation function $\textrm{aggr}^E$. The update functions encode nodes and edges one-by-one and return some updated features. The aggregator combines the updated edge representation along all edges connected to a given node in the graph.

In the spirit of a ‘general framework’, it is a matter of choice which functions to use for update and aggregation.

For the most part, MLPs are used for the updates, while the summation function (or any other permutation-invariant function like min, max or average) is used to aggregate the updated entities.

Next, let’s look at what makes GNs special in building physics-informed learning models?

Physical biases

  1. Graph-based algorithms extend conventional state-of-the-art models like the CNN, which works only on regular grids, to non-regular grids.
  2. To learn physical systems one would have to consider physical laws. These may include:
    • Spatial Equivariance/Invariance,
    • Local Interactions,
    • Superposition Principle,
    • Differential Equation.

By default, most of these principles are captured, since the framework represents information using concepts from graph theory. For example, by using graphs, we get permutation and equivariance for free and also constrain interactions to local neighborhoods. Other principles, like translation invariance, can be easily incorporated by using the relative position between neighboring nodes instead of their absolute coordinates. Concerning the superposition principle of forces in the context of particle-based simulations, using summation aggregation over the representation of forces (edge feature) makes more sense than averaging or choosing the largest.

Helpful references on the topic of Geometric Deep Learning with further details on graphs, including invariance groups and physical intuitions, are the recent review by Bronstein et al., 2021, and the PhD thesis of Cohen, 2021.


MeshGraphNets and its Predecessor

In the following section, we build up the argument that MeshGraphNets originate from molecular dynamics. However, the connection becomes clearer if we take a step back and look at MeshGraphNets’ predecessor - GNS.

Graph Network-based Simulators (GNS)

The paper “Learning to Simulate Complex Physics with Graph Networks” by Sanchez-Gonzalez et al., 2020 uses an encoder-processor-decoder architecture to define the “Graph Network-based Simulators” (GNS) framework. With its iterative application of Graph Networks, this work improves upon the 4 years older “Interaction Networks for Learning about Objects, Relations and Physics” by Battaglia et al., 2016, where the ability of Graph Networks to learn the physics of particle collisions was shown for the first time. GNS showed that GN-based approaches can be used for particle dynamics on large scale systems (up to 85k particles). The working principle is summarized as follows.

Graph Network Block
Fig.2: GNS scheme (Image source: Sanchez-Gonzalez et al., 2020).

$X^{t}$ represents the state of the particle system at time $t$. With an initial time of $t_0$ and final time $t_K$, the dynamics of the system can be represented as

\[\textbf{X}^{t_{0:K}} = \{X^{t_0},X^{t_1},...X^{t_K} \}.\]

The task is then to learn the differential operator $ d_{\theta}$, which approximates the dynamics:

\[d_{\theta} : X^{t_k} \rightarrow Y^{t_k}\] \[X^{t_{k+1}} = \text{Update} \{ X^{t_k}, d_{\theta} \}\]

To do so the encoder-processor-decoder architecture is utilized:

\[\text{Encoder} \rightarrow \text{Processor} \rightarrow \text{Decoder}\]

of which we break down every single component, as well as the loss and the update-step used at inference time.

Encoder: Takes as input the particles $X^{t_k}$ at time $t_k$ (and potentially a history $h$ of somewhere between $1$ and $10$ previous states, which we omit in the equation below for simplicity) and encodes them into a graph.

\[G^{0} = \text{Encoder}(X)\]

Here, $G = (V,E)$. The paper implements the global features as part of the node features, thus no global features appear explicitly in $G$. The edges are obtained by connecting nodes that are within some interaction radius $r_W$ of each other.

GNS Encoder
Fig.3: GNS Encoder.

Processor: It’s a multilayer Graph Network. The exact number of layers is an hyperparameter; 5-6 worked well for this paper. This GN performs message passing.

\[G^{l} = \text{GN}^l (G^{l-1})\]
GNS Processor
Fig.4: GNS Processor.

Decoder: Lastly, the output graph $G^{L}$ is decoded back to physical space, in which it represents the acceleration of particles.

\[\tilde{\ddot{X}} = Y=\text{Decoder}(G^L)\]
GNS Decoder
Fig.5: GNS Decoder.

Loss: During training, the loop will not update to the subsequent state $\tilde{X}$ because we are interested in approximating the acceleration directly. The $\tilde{\square}$ is used to denote that the quantity is the output of a learned model and contains errors. Thus, we compute the loss w.r.t. the target acceleration $\mathbf{p}^{t}$.

\[\text{Loss} = \text{MSE}\left(\tilde{\ddot{X}}, \mathbf{p}^{t}\right)\]

Update: This extra step is only used during inference. In the simplest case, the velocity $\tilde{\dot{X}}$ is computed from the acceleration $\tilde{\ddot{X}}$ using Forward Euler integration, after which the integration is repeated for the velocity to obtain the new position $\tilde{X}^{t+1}$. In the equations below, we see that the network simply learns the force acting on a Newtonian particle, whose trajectory is integrated with time step $\delta t=1$ (this $\delta t$ is not physical time as the physical time is imposed by the time between training samples; to see the equivalence between the first equation and Newtonian mechanics, consider $a=\tilde{\ddot{X}}$, $v^t= \tilde{\dot{X}}^{t}$ and $v^{t+\delta t}=\tilde{\dot{X}}^{t+1}$).

\[\begin{align} \tilde{\dot{X}}^{t+1} &= \tilde{\dot{X}}^{t} + \tilde{\ddot{X}}^{t} \qquad \Longleftrightarrow \qquad F = m a =m \frac{d v}{d t} \approx m \frac {v^{t+\delta t}-v^t}{\delta t} \\ \tilde{X}^{t+1} &= \tilde{X}^{t} + \tilde{\dot{X}}^{t+1} \end{align}\]


The GNS model shows good performance on fluid simulations or fluid solid interactions. However, it fails to adequately model deforming meshes such as thin shells. This motivated the follow-up work MeshGraphNets.

MeshGraphNets

The MeshGraphNets framework was presented in the paper “Learning Mesh-Based Simulation with Graph Networks” by Pfaff et al., 2021 at ICLR 2021 in which they extended the GNS framework by supplementing the GNS’ Euclidean spatial coordinates with a set of edges to define a mesh upon which the different interactions can be learned. Thus, the problem of finding a universal interaction function is split into two separate problems: finding the interaction of “mesh” type and “collision” type edges. Mathematicians and Physicists talk about the superposition principle, i.e. splitting a complicated function into the sum of multiple simpler ones, which is precisely what was done here.

One further contribution of the paper lies in its introduction of adaptive remeshing (common engineering practice) to Graph Networks. This allows MeshGraphNets to model a wider dynamics-spectrum, as shown by these cases, which cannot be modeled with the GNS framework:

  1. If a mesh deforms, such that two distant nodes on the mesh meet in real space, then GNS will most probably become unstable and diverge, see video.
  2. If a mesh deforms dynamically and we omit adaptive remeshing, then errors accumulate fast due to missing high frequency information.
  3. Extending the output feature vector from $\mathbf{y}_i$ to $\mathbf{p}_i$ (see Notation), containing additional auxiliary variables, allows to train the algorithm to predict more than just locations, e.g. stress field, and thus extending the use cases.

Next, we briefly summarize the working principles of MeshGraphNets.

MeshGraphNets
Fig.6: MeshGraphNets scheme (Image source: Pfaff et al., 2021).

The similarity to the GNS pipeline is apparent (see Fig.2). Here, we only highlight the differences between the two methods.

Encoder: Takes as additional input a predefined mesh or, during inference, the mesh from the last iteration. Then, if desired, remeshing is applied. Finally, when constructing the world graph (equivalent to the GNS graph), nodes already connected on the mesh are excluded.

Processor: Same as GNS.

Decoder: Output feature vector $\mathbf{p}_i$ contains additional entries like derivatives of dynamical features $\mathbf{q}_i$ or other quantities of interest like pressure.

Loss: In contrast to GNS, the output is extended to the more general formulation with any number of additional output features. But as long as the corresponding training data is available, the L2 loss is again computed for the one time step prediction.

Update: Same as GNS for Lagrangian systems. Marginal modification regarding integration of arbitrary further features.


A picture Pseudocode is worth a thousand words (and equations).’


Following this theme, instead of repeating what is already in the paper (Pfaff et al., 2021), we invite the reader to go through the pseudocode below.

GNS Decoder
Fig.7: MeshGraphNets pseudocode.

For the sake of simplicity, in this pseudocode no previous states are required, i.e. zero history ($h=0$). This setting is true for all experiments except the cloth experiment, in which one previous state ($h=1$) is used to estimate the velocities. In terms of the pseudocode, this means to change the input to the Encoder from $M^{t_i}$ to \(\{ M^{t_{i-h}},...M^{t_i} \}\).


‘Now, you know MeshGraphNets. Well, then: Why molecular dynamics?’



From MD to GNS

In this section, we show the similarities between MeshGraphNets, its predecessor GNS and two much older methods. We begin more than 50 years ago with Molecular Dynamics (MD) and then move to the late 70s with Smoothed Particle Hydrodynamics (SPH), both of which can be considered among the first graph network simulators. At the end, we compare all these particle-based algorithms in a table.

Molecular Dynamics

MD (see textbook by Frenkel, and Smit, 2002) is a widely used simulation method which generates the trajectory of an N-body atomic system. Indeed, there are many ways to implement MD, but we restrict our discussion to the simplest, unconstrained (no ensemble constraints) Hamiltonian mechanics description.

The first obvious similarity between MD and MeshGraphNets is the construction of the connections/edges: both can have a mesh as input and both compute the interactions based on spatial distances up to some fixed cut-off threshold $r_W$. Actually, this one is tricky because graph networks apply a sequence on $L$ iterative updates (aka GN layers), each of which reaches at most radius $r_W$ away, leading to a total maximum outreach of $L r_W$. Given that MeshGraphNets work with the same order of magnitude of world-graph neighbors inside a ball with radius $r_W$, but do 15 iterative updates on top of it (a ball with $15\times$ larger radius has a $3375$ larger volume), then graph network approaches practically interact with most of the other particles. This, among others, explains why the graph network paradigm is so successful.

In addition, both MD and GNS are translationally invariant w.r.t. the accelerations and permutation equivariant w.r.t. the particles.

As a further similarity, both MD and GNS spend most of their compute time on the computation of accelerations, as a way to integrate the time evolution of a Newtonian system (reminder: Newtonian system $F=ma$). However, there is a major difference in the time integrator scheme: MD mainly uses symplectic integrators like Leapfrog, whereas GNS simply uses Forward Euler. We note that in a recent work by Sanchez-Gonzalez et al., 2019, it was shown that higher order integrators such as 4th order Runge-Kutta (RK4) have advantages like lower errors and better generalizability w.r.t. varying time steps. In this sense, an interesting future research idea would be comparing Forward Euler, RK4 and Leapfrog.

The main difference between MD and the GNS paper is that MD computes the accelerations by evaluating a predefined potential function, whereas GNS uses a learned GN model. The most difficult engineering part of MD is the design of the potential function. The potential often has a fairly complicated form because we need to manually incorporate all physical properties, i.e. for molecules we compute the potential for each bonding length, bonding angle, rotation angle, stretch-stretch cross term, bend-bend cross term, etc. as well as the Coulomb and Lennard-Jones potentials. That is why (equivariant) NNs are becoming so attractive for this task, see. e.g. Batzner et al., 2021. Although GNS does not target molecular graphs, recently huge progress has been made with the graph networks paradigm on molecular property prediction (Brandstetter et al., 2021, Klicpera et al., 2021).

Smoothed Particle Hydrodynamics

A closer relative to GNS is the SPH algorithm (Lucy, 1977; Gingold, and Monaghan, 1977), which originates from astrophysics and is a well-known simulation technique in computer graphics (Ilmsen et al., 2014) and engineering, e.g. for multi-phase flows (Hu, and Adams, 2006). Briefly, SPH discretizes the partial differential equations of fluid dynamics, namely the Navier-Stokes Equations (NSE), by truncated radial weighting kernels $W$, such that these discrete particles follow Newtonian mechanics with some prescribed force potential $\mathcal{P}$, very much resembling MD. For a short overview of SPH with many visualizations, see the slides of Matthias Teschner here.

First of all, both SPH and GNS model materials obey the continuum assumption, whereas MD operates on discrete particles, e.g. atoms. In addition to much smaller spatial scales, MD also operates on extremely short time intervals due to its computational intensity, which is a key difference to the other two approaches.

Again, all properties concerning invariance and equivariance in MD also apply to SPH. SPH simply operates on different space and time scales, but yet it solves predefined equations just like MD.


Now, as you more or less know how MD and SPH work, let’s compare them to GNS!


Comparison

To illustrate the similarities and differences between

  1. An MD-like approach with large pseudoparticles
  2. SPH
  3. The learned particle-based approach GNS during inference

We provide the following table divided into encoder, processor and decoder sections. The table shows one update step, whose output is the simulated acceleration $\tilde{\ddot{X}}$. Finally, for simplicity, we assume that all particles have equal mass.

Disclaimer: MD, SPH and GNS are particle-based methods, which makes the comparison to MeshGraphNets unintuitive. We will address that issue in the next section. For now, we simply include MeshGraphNets in the table with all other methods.

MD-like SPH GNS MeshGraphNets
Inputs and neighborhood search
$X^t$, $\dot{X}^t$, $M^t$, $\mathcal{P}$ (potential incl. mesh), $\mathbf{n}_{1:N^v}$ $X^t$, $\dot{X}^t$, kernel $W$, EoS, $NSE$ (incl. viscosity), $\mathbf{n}_{1:N^v}$ $\{X^{t-h},...X^t\}$, $\mathbf{q}_{1:N^v}$, $\mathbf{n}_{1:N^v}$, $\epsilon^V, \epsilon^W$, $\text{GN}^{1:L}$, $\delta^V$ $\{M^{t-h},...M^t\}$ with $M=\{\hat{V}, \hat{E}^M\}$ and $\hat{V}=\{U, X, \mathbf{q}_{1:N^v}, \mathbf{n}_{1:N^v} \}$, $\epsilon^V, \epsilon^M, \epsilon^W$, $\text{GN}^{1:L}$, $\delta^V$
$L = \text{find_neighb}(X^t)$
(~100+)
$L = \text{find_neighb}(X^t)$
(~30-40)
$L = \text{find_neighb}(X^t)$
(~10-20)
$L = \text{find_neighb}(X^t)$
(~0-20)
Encoder
$\tilde{\mathcal{P}} = \mathcal{P}(M^t, L, \mathbf{n}_{1:N^v})$ $\rho_i=\sum_j W_{ij}(L)$
$p_i=EoS(\rho_i)$
$V=\epsilon^V(X^{(t-h):t}, \mathbf{q}_{1:N^v}^t, \mathbf{n}_{1:N^v}^t)$, $E^W=\epsilon^W(X^t, L)$, $G^0=\{V, E^W\}$ $V=\epsilon^V(\hat{V}^{(t-h):t})$, $E^M=\epsilon^M(U,\hat{E}^M)$, $E^W=\epsilon^W(X^t, L)$, $G^0=\{V, E^M, E^W\}$
Processor
$\tilde{\ddot{\mathbf{x}}}_i = -\nabla \tilde{\mathcal{P}}(\mathbf{x}_i)$ $\tilde{\ddot{\mathbf{x}}}_i = NSE(\rho_i, p_i, \dot{\mathbf{x}}_i, \mathbf{n}_i)$ $G^L=\text{GN}^L \circ ... \text{GN}^1 \circ G^0$ $G^L=\text{GN}^L \circ ... \text{GN}^1 \circ G^0$
Decoder
$\tilde{\ddot{\mathbf{x}}}_i=\mathbf{y}_i=\delta^V(V^L)$ $\tilde{\ddot{\mathbf{x}}}_i = \tilde{\mathbf{p}_i^{t+1}} = \delta^V(V^L) $

We denote the equation of state with EoS and the Navier-Stokes Equations with NSE.

The first thing we see in the table are the inputs, which can be divided into 3 groups: 1) coordinates and derivatives thereof, 2) geometric relations, and 3) physical parameters. Contrary to MD and SPH, which explicitly require the velocity as input, the learned simulators approximate the velocity by taking a history of the coordinates. The geometric relations, excluding the neighbors lists, are included for MD in the potential. For SPH and GNS there is no direct way to include them, and for MeshGrahpNets the geometric relations are included in the mesh. And lastly, contrary to MD and SPH with explicit physical laws, the learned methods have to learn all governing parameters, like viscosity and gravity, from the data.

Having made these observations, it seems like SPH is the least useful method because it requires physical knowledge and cannot include meshes. But if we look at the neighbors search for MD in the table, we see that much more particles are included (larger interaction radius) to properly estimate the forces.

Looking at the main part of the table, i.e. Encoder-Processor-Decoder, we clearly see that the job of the Graph Network is to approximate the gradient of the potential $\mathcal{P}$ in a latent space described by the encoder and decoder. As a vague comparison between GNS and MD, one could argue that constructing the specific potential $\tilde{\mathcal{P}}$ corresponds to encoding the geometric information $L$ similar to the construction of $G^0$.


From GNS to MeshGraphNets

Meshes

Previously, the insertion of a mesh through the means of MD-like particles with a modified potential function was explained. In pure SPH this is not possible, which is why one would tend to use a finite element solver for such problems. Conventional thin surface finite element simulators, like ArcSim, contain a lot of hard-coded engineering knowledge, e.g. complicated elasto-static material laws, and are very well-suited for such problems.

Fortunately, from the point of view of learned GN-based simulators, it does not matter, whether the graph $G^0$ depends on a predefined mesh $M$, or whether we construct the interaction graph exclusively based on Euclidean distances. However, the authors of the MeshGraphNets paper found out that superimposing the solutions computed on both edge sets $E^M$ and $E^W$ separately is more powerful than training the model to implicitly distinguish the origin of the interaction. According to the reference implementation on GitHub, the authors train only one Graph Networks set of 15 layers on both mesh-edges and world-edges, but they do not let both edge sets interact during one rollout step. Thus, the only way that one edge set can influence the other is in the consecutive time step, after the jointly updated nodes have gathered information from both edge sets.

Making this small step in terms of GNs implementation complexity and including further edge sets, might seem like no big deal. However, this opens many new doors for engineering applications: basically all conventional particle-based and mesh-based simulations can be approached in this manner!

In the previous section on MD, SPH and GNS, we mainly discussed particle-based methods. In keeping with the general theme of making connections between traditional fluid mechanics and learned simulators, we now claim that meshes and particles are essentially the same thing.

Fluid-particle model

We go back to the 1990s and the Fluid Particle Model (FPM) by Español, 1998, which is a mesoscopic Newtonian fluid model, as opposed to the microscopic MD and the macroscopic SPH. In this work, the concept of “fluid particle” is analyzed from the point of view of a Voronoi tessellation of a molecular fluid. The model claims to be a generalization of Dissipative Particle Dynamics (DPD) by Hoogerbrugge, and Koelman, 1992 and SPH.

Delyanou and Voronoi
Fig.8: Single points (left), Delaunay triangulation (middle) and Voronoi diagram (right) (Image source: Rokicki, and Gawell, 2016).

Español, 1998 suggests to use Voronoi tessellation as a way of coarse-graining atomistic systems to pseudoparticle systems, where each pseudoparticle is an ensemble of atoms in thermal equilibrium. Starting with a set of points spread over the domain of interest, Voronoi tessellation assigns to each point (later pseudoparticle location) the region of space that is closer to this point than to any other point. In this way, physical space is divided into non-overlapping cells that cover the full domain. Coming from MD, SPH and GNS, the cell centers correspond to the location of the simulated particles. Before we build the connection to MeshGraphNets, we note that there are at least two important meshes related to the Voronoi diagram: 1) the mesh defined by the Voronoi edges and vortices, and 2) the unique triangular mesh obtained by Delaunay triangulation. This second mesh has a 1-to-1 correspondance to the triangular mesh used in the flag example of the MeshGraphNets paper, where each mesh node also corresponds to the cell center of a simulated pseudoparticle.

Time extrapolation and training noise

Here, we argue that the invaluable training noise closely relates GNS and MeshGraphNets to mesoscopic fluid dynamics, e.g. DPD.

The GNS paper faces problems with long rollouts - the longest rollout increase is demonstrated on the Ramps-Large example with ~8.5x extrapolation and in a further example it is observed that solids may become deformed over long rollouts. Nevertheless, 8.5x extrapolation is impressive, and the authors claim that it is achieved through using training noise. In the MeshGraphNets paper truly remarkable results were demonstrated with rollouts 100x longer than the training sequence on the flag experiment. And again the authors explain the success through training noise.

Following Español, 1998, both DPD and SPH operate on pseudoparticles and the underlying assumption of these approaches is that:

  1. The ensemble of molecules included in a pseudoparticle is large enough to be considered as a thermodynamic system.
  2. The variation among neighboring Voronoi cells is small.

In addition, in DPD the ensembles are not too big, such that Brownian motion has to be considered, whereas in SPH the ensembles are so big that all atomistic thermal agitations cancel out. Thus, by imposing the right amount of Gaussian noise (describing Brownian motion), a system is approximated, in which molecular randomness plays a role. This is essentially what injecting noise on the inputs results in.

Clearly, GNS and MeshGraphNets do not operate on such small scales, and yet injecting noise seems to be a useful tool. One explanation could be that by training the model to predict the true output despite noisy input, the model converges to the central limit of the estimated conditional distribution of the acceleration, as was previously similarly observed by Yeo, 2019 for recurrent neural networks.

Remeshing

In addition to introducing meshes, another important contribution of the MeshGraphNets paper is adaptive remeshing. In a nutshell, some of the output features of the model are devoted to learning the curvature, avoiding complicated conventional estimates of that quantity. Thus, when the curvature exceeds a threshold, we can simply apply a low-cost conventional remeshing algorithm in the middle of the pipeline. Remeshing is what allows MeshGraphNets to simulate the complex flag dynamics seen in the demos.

Considering the second assumption by Español, 1998 from the evaluated references from the previous subsection, it is desired to have Voronoi cells, such that their size is inversely proportional to variations in their properties, so as to the variation is kept small. Thus, a natural approach would be to sample more discretization points in regions with high property variation. One possible quantitative measure of these variations could be the fluid relaxation (Fu et al., 2019). The very same line of reasoning is employed in the mesh refinement of the MeshGraphNets algorithm


Before you get too excited about MeshGraphNets, we will look at some alternatives.


Comparison to Related Work

There exist a number of approaches to the prediction of future states of physical systems that implicitly approximate the forward operator through an embedding into a latent space as pioneered by Hinton, and Salakhutdinov, 2006 and later extended to the variational paradigm by Kingma, and Welling, 2013. Another approach is to utilize graph networks at a higher level to achieve such approximation. While the MeshGraphNets framework is very general, there are scenarios, such as when the future state of only a few chosen observables is desired, where other approaches would lead to better results. In the following section, we give a brief overview and comparison of these related works, which all lift the physical system into a chosen latent space, to then propagate the system forward on this latent space.

Graph Network-based

The currently used graph-based approaches for fluid simulations can be categorized along the following characteristics:

  1. How much physics is contained in the solver?
    • Augmenting classical solvers
    • Containing some physics
    • Purely learned
  2. How many time updates does the approach utilize?
    • Single time step update
    • Direct time methods

The trade-off among these methods is between generality and accuracy. Having a purely learned method as general as MeshGraphNets imposes high computational costs at training time due to large model size. For more complex fluid flow problems, one has to deal with the additional cost of generating an adequate dataset, which depends on the simulator and the specific problem configuration and can take months (Dalton et al., 2021).

A very prominent idea, and good comparison to MeshGraphNets, which augments a conventional differentiable solver, is presented by Belbute-Peres et al., 2020. The authors combine the conventional SU2 solver on a coarse grid with a graph network to achieve a significant reduction in computational costs. However, this model’s performance lacks behind MeshGraphNets on more complex benchmarks. This is quite possibly a result of the lack of relative encodings and the flexibility of the message passing steps of the processor of MeshGraphNets. In addition, such approaches are highly dependent on the stability of their gradients, which have been shown to be instable for fluid flows (Wang, 2012) before, and which has recently resurfaced in the context of applying gradient-based methods to general dynamical systems, such as MD (Metz et al., 2021).

Some followup works to MeshGraphNets extending specific aspects of the algorithm include:

  • Chen et al., 2021 solves 2D laminar flow around complicated shapes
  • Rubanova et al., 2021 solves a learned constraint optimization problem improving MeshGraphNets on tasks involving strong interactions
  • Xu et al., 2021 incorporates the idea of conditional parametrization into the GNN to better capture the relationships between physical quantities.

So far only “single time step update” methods were shown. The alternative approach to problems with steady state solutions is called the “direct time method”, in which one directly predicts the final solution instead of performing an iterative rollout. Such approaches are presented by Harsch, and Riedelbauch, 2021 and Meyer et al., 2021.

Comparable Paradigms

When we reduce the concept of the MeshGraphNet to the core of its architecture, which is the $\text{Encoder} \rightarrow \text{Processor} \rightarrow \text{Decoder}$ architecture, we obtain a set of analogies to related architectures. These architectures are based on different principles, but follow the same approach of lifting the dynamics onto a latent space, where they are then able to advance the system in time with a learned forward-operator. This is in stark contrast to the Neural ODEs of Chen et al., 2018, which utilize a recurrent neural network as encoder, but employ a fixed ODE-integrator on the latent space. The Hamiltonian Generative Networks of Toth et al., 2020 utilizes Hamiltonian dynamics by learning an embedding onto a Hamiltonian space, on which it then applies the learned Hamiltonian operator to advance the system in time. In contrast to this, Koopman Theory (see Brunton et al., 2021 for a review) seeks to find an embedding on a finite-dimensional coordinate system, in which the Koopman eigenfunctions provide intrinsic coordinates to globally linearize the dynamics of our system as demonstrated by Lusch et al., 2018. This leads to a linear dynamical system on the latent space, where the Koopman operator can then be iteratively applied to propagate the system in time. The cornerstones of these two approaches can be compared with MeshGraphNets in the following way:

Hamiltonian Generative Networks Koopman Embeddings MeshGraphNets
Latent Space
Hamiltonian Space Lifted to finite-dimensional coordinate system, in which Koopman eigenfunctions globally linearize dynamics Lifted to high-dimensional space
Time-Stepping
Iterative application of Hamiltonian operator Iterative application of Koopman operator Iterative application of Graph Network blocks
Modelling
Full state-space Space of Observables Full state-space


What sets MeshGraphNets apart from these two paradigms is its independence from a latent space which has to fulfill a set of properties, as is the case for Hamiltonian Generative Networks, and Koopman Embeddings. Especially for Koopman embeddings, the question of finding the right coordinate system remains an open and unsolved challenge, making MeshGraphNets the more general framework for the time being.


Summary and Practical Considerations

Here, we briefly list the strengths and weaknesses of the MeshGraphNets framework, and pose some open questions for further investigation.

Strengths

  • Unification of particle-based and mesh-based methods.
  • Time extrapolation to long rollouts demonstrated on the flag experiment.
  • Remeshing is easily integrated in the framework, making it much more useful for real world problems.
  • Learning local interactions allows extrapolating to theoretically infinitely large domains and number of particles.

Weaknesses

  • Training noise seems to be indispensable, but there is no clear strategy how to find its optimal amount. In addition to that, the authors introduce a second hyperparameter $\gamma$ for the correction of the noise effects, resulting in 2 new hyperparameters in total.
  • Long training times and many model parameters. The base model used in the flag_simple example has around 2.5M parameters and training the model for the suggested 10M iterations takes 5 days on an accessible GPU like Nvidia RTX 2070 and 2 days on the newest Nvidia A6000.

Open Questions

  • Scaling MeshGraphNets (so far applied to 5k nodes max) to industrial applications with millions of nodes would require parallelizing, which is yet to be explored.
  • Understanding the training noise is another important future direction.
  • Training data is limited in many domains and it would be interesting to measure how much the performance depends on the amount of training data.
  • The interpretability of the model and its components is currently limited. In this post, we have analyzed the model from the perspective of fluid mechanics, but there is much more work to be done.
  • Extending to complex fluids or even multiphase dynamics would be yet another interesting future direction.

Code

For some open source implementations, visit the related Papers With Code page. There you will find the official repository by DeepMind using Tensorflow v1, and you should also see at least one PyTorch implementation. At the time of writing this blog, there is no JAX implementation listed there.

We partially fill this gap with the code snippet below. The snippet demonstrates how to implement a fully functional MeshGraphNets model in JAX with just 50 lines of code. This compact implementation should encourage people with less experience in the field to dive deeper and apply the model to their applications.

References

  1. Pfaff, Fortunato, Sanchez-Gonzalez, and Battaglia. “Learning Mesh-Based Simulation with Graph Networks.” International Conference on Learning Representations, 2021.
  2. Battaglia, Hamrick, Baphst, Sanchez-Gonzalez, Zambaldi, Malinowski, Tacchetti, Raposo, Santoro, Faulkner, Gulcehre, Song, Ballard, Gilmer, Dahl, Vaswani, Allen, Nash, Langston, Dyer, Heess, Wierstra, Kohli, Botvinick, Vinyals, Li, and Pascanu. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261, 2018.
  3. Scarselli, Gori, Chung Tsoi, Hagenbuchner, and Monfardini. “The graph neural network model.” IEEE Transactions on Neural Networks 20.1: 61-80, 2008.
  4. Bronstein, Bruna, Cohen, and Velickovic. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv preprint arXiv:2104.13478, 2021.
  5. Cohen. Equivariant Convolutional Networks. PhD Thesis University of Amsterdam, 2021.
  6. Sanchez-Gonzalez, Godwin, Pfaff, Ying, Leskovec, and Battaglia. “Learning to Simulate Complex Physics with Graph Networks.” International Conference on Machine Learning. PLMR, 2020.
  7. Battaglia, Rascanu, Lai, Rezende, and Kavukcuoglu. “Interaction Networks for Learning about Objects, Relations and Physics.” Advances in Neural Information Processing Systems, 2016.
  8. Frenkel, and Smit. “Understanding Molecular Simulation.” Elsevier, 2002.
  9. Sanchez-Gonzalez, Bapst, Cranmer, and Battaglia. “Hamiltonian Graph Networks with ODE Integrators.” arXiv preprint arXiv:1909.12790, 2019.
  10. Batzner, Musaelian, Sun, Geiger, Mailoa, Kornbluth, Molinari, Smidt, and Kozinsky. “E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials.” arXiv preprint arXiv:2101.03164, 2021.
  11. Brandstetter, Hesselink, van der Pol, Bekkers, and Welling. “Geometric and Physical Quantities Improve E(3) Equivariant Message Passing.” arXiv preprint arXiv:2110.02905, 2021.
  12. Klicpera, Becker, and Günnemann. “GemNet: Universal Directional Graph Neural Networks for Molecules.” arXiv preprint arXiv:2106.08903, 2021.
  13. Lucy. “A numerical approach to the testing of the fission hypothesis.” The Astronomical Journal 82: 1013-1024, 1977.
  14. Gingold, and Monaghan. “Smoothed particle hydrodynamics: theory and application to non-spherical stars.” Monthly Notices of the Royal Astronomical Society 181.3: 375-389, 1977.
  15. Ihmsen, Orthmann, Solenthaler, Kolb, and Teschner. “SPH Fluids in Computer Graphics” Eurographics, 2014.
  16. Hu, and Adams. “A multi-phase SPH method for macroscopic and mesoscopic flows.” Journal of Computational Physics 213.2: 844-861, 2006.
  17. Español. “A Fluid Particle Model.” Physical Review E 57.3: 2930, 1998.
  18. Hoogerbrugge, and Koelman. “Simulating Microscopic Hydrodynamic Phenomena with Dissipative Particle Dynamics.” Europhysics Letters 19.3: 155, 1992.
  19. Rokicki, and Gawell. “Voronoi diagrams – architectural and structural rod structure research model optimization.” MAZOWSZE Studia Regionalne, 2016.
  20. Yeo. “Short note on the behavior of recurrent neural network for dynamical system” arXiv preprint arXiv:1904.05158, 2019.
  21. Fu, Han, Hu, and Adams. “An isotropic unstructured mesh generation method based on a fluid relaxation analogy.” Computer Methods in Applied Mechanics and Engineering 350: 396-431, 2019.
  22. Hinton, and Salakhutdinov. “Reducing the Dimensionality of Data with Neural Networks.” science 313.5786: 504-507, 2006.
  23. Kingma, and Welling. “Auto-Encoding Variational Bayes.” stat 1050: 1, 2014.
  24. Dalton, Lazarus, Rabbani, Gao, and Husmeier. “Graph Neural Network Emulation of Cardiac Mechanics.” International Conference on Statistics: Theory and Applications, 2021.
  25. Belbute-Peres, Economon, and Kolter. “Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow Prediction.” International Conference on Machine Learning. PMLR, 2020.
  26. Wang. “Forward and Adjoint Sensitivity Computation of Chaotic Dynamical Systems.” Journal of Computational Physics 235: 1-13, 2013.
  27. Metz, Freeman, Schoenholz, and Kachman. “Gradients are Not All You Need.” arXiv preprint arXiv:2111.05803, 2021.
  28. Chen, Hachem, and Viquerat. “Graph neural networks for laminar flow prediction around random two-dimensional shapes.” Physics of Fluids 33.12: 123607, 2021.
  29. Rubanova, Sanchez-Gonzalez, Pfaff, and Battaglia. “Constraint-based graph network simulator.” arXiv preprint arXiv:2112.09161, 2021.
  30. Xu, Pradhan, and Duraisamy. “Conditionally-Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems.” Advances in Neural Information Processing Systems 34, 2021.
  31. Harsch, and Riedelbauch. “Direct Prediction of Steady-State Flow Fields in Meshed Domain with Graph Networks.” arXiv preprint arXiv:2105.02575, 2021.
  32. Meyer, Pottier, Ribes, and Raffin. “Deep Surrogate for Direct Time Fluid Dynamics.” arXiv preprint arXiv:2109.09510, 2021.
  33. Chen, Rubanova, Bettencourt, and Duvenaud. “Neural Ordinary Differential Equations.” Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.
  34. Toth, Rezende, Jaegle, Racaniere, Botev, and Higgins. “Hamiltonian Generative Networks.” International Conference on Learning Representations, 2020.
  35. Brunton, Busidic, Kaiser, and Kutz. “Modern Koopman Theory for Dynamical Systems.” arXiv preprint arXiv:2102.12086, 2021.
  36. Lusch, Kutz, and Brunton. “Deep learning for universal linear embeddings of nonlinear dynamics.” Nature Communications 9.1: 1-10, 2019.