Track: Track 2: Dataset Proposal Competition
Keywords: Foundational Molecular Formulation Database; Mixture Property Prediction; Machine Learning for Chemistry; Autonomous Experimentation (AE); Self-driving Laboratory (SDL)
Abstract: Predicting the properties of multi-component mixtures remains a fundamental open problem across chemistry and materials science. Unlike single-molecule systems, mixture behavior exhibits strong nonlinearities driven by thermodynamic excess properties and interfacial interactions, making linear mixing rules ineffective. Existing datasets are sparse, fragmented, and lack negative results or standardized metadata, hindering the development of machine learning (ML) models that generalize across formulation spaces. We propose the Foundational Molecular Formulation Database, a large-scale, open dataset generated using modular self-driving laboratories (SDLs) for automated, high-throughput experimentation. The database will span four domains---battery electrolytes, thermofluids, fragrances, and solution-processed semiconductors---capturing key functional properties (e.g., ionic conductivity, thermal transport, olfactory descriptors, film stability) and associated metadata. This resource is designed to benchmark ML tasks in property prediction, generative mixture design, and active learning, while enabling models to learn emergent structure--property relationships in high-dimensional, combinatorial spaces. By providing standardized, dense sampling of mixture property landscapes, this effort aims to establish a foundation for data-driven discovery analogous to the role of ImageNet in vision or AlphaFold datasets in structural biology.
Submission Number: 325
Loading