Overcoming Order in Autoregressive Graph Generation for Molecule Generation

TMLR Paper2139 Authors

04 Feb 2024 (modified: 17 Jul 2024)Decision pending for TMLREveryoneRevisionsBibTeX
Abstract: Graph generation is a fundamental problem in various domains, and is of particular interest in chemistry where graphs may be used to represent molecules. Recent work has shown that molecular graph generation using recurrent neural networks (RNNs) is advantageous compared to traditional generative approaches which require converting continuous latent representations into graphs. One issue which arises when treating graph generation as sequential generation is the arbitrary order of the sequence which results from a particular choice of graph flattening method: in the chemistry setting, molecular graphs commonly have multiple SMILES strings corresponding to the same molecule. Inspired by the use case of molecular graph generation, we propose using RNNs, taking into account the non-sequential nature of graphs by adding an Orderless Regularization (OLR) term that encourages the hidden state of the recurrent model to be invariant to different valid orderings present under the training distribution. We demonstrate that sequential molecular graph generation models benefit from our proposed regularization scheme, especially when data is scarce. Our findings contribute to the growing body of research on graph generation and provide a valuable tool for various applications requiring the synthesis of realistic and diverse graph structures.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: This is the camera-ready version.
Assigned Action Editor: ~Alberto_Bietti1
Submission Number: 2139
Loading