MoCDiff: Efficient Motif-Constrained Discrete Diffusion for Molecule Generation

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Molecular Generation, Motif-Aware Tokenization, Molecular Graph Generation, Discrete Diffusion Models, Graph-to-Sequence Learning
Abstract: Molecular graph generation requires models that capture chemical structure while producing valid, diverse, and novel molecules within practical sampling budgets. We present MoCDiff, a Motif Constrained masked Discrete Diffusion framework built on two complementary components: mSENT, a motif-aware graph-to-sequence tokenizer, and an optimized constrained sampler extending Constrained Discrete Diffusion (CDD). The mSENT tokenizer biases graph traversal toward chemically coherent substructures - ring systems, aromatic regions, and strongly coupled bond patterns - so that atoms sharing rigid chemical scaffolds appear at contiguous token positions rather than being scattered by syntax-driven ordering. The constrained sampler combines inexact augmented Lagrangian updates, adaptive penalty scheduling, lazy projection, and cachebased decode checks to concentrate feasibility enforcement near the final decoded molecule, where corrections carry useful chemical signal. Under a matched MDLM backbone on QM9, mSENT raises validity from 85.3% to 90.5% and uniqueness from 75.0% to 97.7% over standard SENT tokenization. On both QM9 and MOSES, the optimized sampler achieves 1.62× and 1.38× lower wall-clock time per candidate respectively, while improving accepted-sample throughput. Together, these results demonstrate that motif-aware serialization and efficient constraint enforcement are critical and complementary design choices for practical discrete diffusion over molecular graphs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 175
Loading