Keywords: Polymer Informatics, Generative models, Materials discovery, Domain-specific evaluation, SMILES-based modeling, Transformers
TL;DR: PoGE is a transformer-based polymer generative model with a physics-informed evaluation that produces chemically valid polymers closely matching experimental structures, advancing polymer design and discovery.
Abstract: Recent advances in machine learning have accelerated progress in chemistry, enabling new capabilities in molecular design, property prediction, and materials discovery. A critical challenge in materials science is designing polymers with targeted macroscopic properties. However, prior generative models often fail to produce chemically valid polymer structures, hindering progress toward this goal. We introduce PoGE (Polymer Generation and Evaluation), a framework comprising two complementary components: a physics-informed evaluation suite for polymer generative models, and an unconditional transformer-based generative model adapted to polymer representations. Building upon and extending established molecule-centric benchmarks, our evaluation quantifies the alignment between the generated and experimental property distributions using the Wasserstein distance. The generative model is trained on a hybrid corpus of synthetic and experimental polymer representations and enforces polymer-specific validity constraints (“p-validity”) beyond the standard small-molecule validity. PoGE achieves high p-validity and significantly improved agreement with experimental property distributions compared to prior methods, even without explicit property conditioning during generation. By releasing a comprehensive benchmark, a high-quality pre-training corpus, and the trained model, PoGE establishes a foundation for conditional polymer generation tasks (e.g., on-demand reverse design), enabling targeted property optimization and accelerating reproducible, domain-aware polymer discovery.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9207
Loading