Higher-order grammar representations for molecular generation and learning

Published: 03 Mar 2026, Last Modified: 07 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Higher-order Topology, Topological Deep Learning, Combinatorial Complex, Graph Generation, Molecular Foundation Model
TL;DR: We introduce HGR, a higher-order grammar representation that parses combinatorial complexes into sequences, enabling direct higher-order topology generation and transferable molecular representations, alongside the ring-enriched RingDiv benchmark.
Abstract: Molecular learning models are strongly shaped by their underlying representations, yet standard sequential and graph representations struggle to explicitly encode higher-order molecular topology such as ring systems and motifs. To address this gap, we introduce a Higher-order Grammar Representation (HGR), a principled, topology-aware framework that lifts molecules to combinatorial complexes and parses them into a compact sequence of production rules under a context-free higher-order grammar. We devise three complementary lifting strategies, which induce distinct grammars that balance topological expressiveness with rule compactness. To mitigate evaluation biases in existing benchmarks, we construct RingDiv, a ring-enriched benchmark of 1.18M molecules, and curate a higher-quality 300k subset. Across de novo molecular generation, HGR-based models achieve 100\% validity and outperform strong SMILES- and diffusion-based baselines. We further develop a molecular foundation model that integrates HGR with higher-order structural inductive biases, yielding robust and transferable representations across downstream benchmarks.
Submission Number: 68
Loading