Keywords: Material Design & Generation, Crystal Structure Prediction, MOF Structure Prediction, De Novo Generation, Generative Models, Benchmark
TL;DR: We propose MGB, a benchmark for evaluating deep generative models in material generation across crystals, MOFs, and out-of-distribution tasks.
Abstract: We present MGB (Material Generation Benchmark), a comprehensive and standardized platform for evaluating deep generative models in materials science. MGB covers a diverse range of tasks—including crystal structure prediction, de novo material generation, MOF structure prediction, and out-of-distribution (OOD) generation—spanning datasets from inorganic crystals to complex MOFs. It integrates cutting-edge methodologies, from large language models (LLMs) to diffusion-based and hybrid approaches. A key feature of MGB is the construction of dedicated OOD test sets, enabling rigorous evaluation of generalization capabilities.
To ensure fair comparison, MGB employs multi-dimensional metrics that jointly assess structural accuracy, chemical validity, distributional coverage, physical plausibility, and computational efficiency. Extensive experiments highlight clear performance patterns: diffusion models excel in predicting complex crystalline systems, LLMs achieve competitive local accuracy, and MOF-specific flow models substantially outperform general-purpose approaches on MOF prediction. While most methods yield nearly perfect structural validity in de novo generation, their ability to balance accuracy, generalization, and efficiency varies considerably.
Importantly, we select LLMs for OOD case studies given their relatively state-of-the-art performance on in-distribution benchmarks. However, our results reveal a critical limitation: despite strong in-distribution accuracy, LLMs completely fail to generalize to unseen structural families. By establishing a unified framework and offering transparent comparative insights, MGB aims to drive the development of more robust and efficient generative models for materials discovery.
We are organizing all the code and model weights, and we are committed to making the cleanest open-source release possible.
Submission Track: Benchmarking in AI for Materials Design - Full Paper
Submission Category: AI-Guided Design
AI4Mat RLSF: Yes
Submission Number: 129
Loading