Keywords: Software Frameworks, Materials Science, OpenCatalyst Dataset, Graph Neural Networks
TL;DR: A novel, flexible toolkit to enable machine learning researchers to experiment on materials science data, such as the OpenCatalyst dataset, and quickly scale to high-performance computing scale.
Abstract: The Open MatSci ML Toolkit is a flexible, self-contained and scalable Python-based framework to apply deep learning models and methods on scientific data with a specific focus on materials science and the OpenCatalyst Dataset. The primary components of our toolkit include: 1.Scalable computation of experiments leveraging PyTorch Lightning across different computation capabilities (laptop, server, cluster) and hardware platforms (CPU, GPU, XPU) without sacrificing performance in the compute and modeling; 2. Support for DGL for rapid graph neural network development. By sharing this toolkit with the research community via open-source release, we aim to: 1. Ease of use for new machine learning researchers and practitioners that want get started on interacting with the OpenCatalyst dataset which currently makes up the largest computational materials science dataset; 2. Enable the scientific community to apply advanced machine learning tools to high-impact scientific challenges, such as modeling of materials behavior for climate change applications.
Paper Track: Software & Tutorials
Submission Category: AI-Guided Design
Supplementary Material: zip