Abstract: Recently, generative models based on the diffusion process have emerged as a promising direction for automating the design of molecules. However, directly adding continuous Gaussian noise to discrete graphs leads to the problem that the generated data do not conform to the discrete graph data distribution in the training set. Current graph diffusion models either corrupt discrete data through a transition matrix or relax the discrete data to continuous space for the diffusion process. These approaches make it hard to perform extensible conditional generation, such as adapting to text-based conditions, due to the lack of embedding representations and require significant computation resources due to the diffusion process of the bond type matrix. This paper introduces the Hierarchical Graph Latent Diffusion Model (HGLDM), a novel variant of latent diffusion models that overcomes the problem of applying continuous diffusion models directly to discrete graph data. Meanwhile, based on the latent diffusion framework, HGLDM avoids the issues of computational consumption and lack of embeddings for extensible conditional generation. In addition, by comparing the HGLDM with its variant, the Graph Latent Diffusion Model (GLDM), which only has graph-level embeddings, we validate the advantage of the hierarchical graph structure for capturing the relationship between structure information and molecular properties. We evaluate the performance of our model through various conditional generation tasks, demonstrating its superior performance.
External IDs:doi:10.1145/3627673.3679547
Loading