CheMixHub: Datasets and Benchmarks for Chemical Mixture Property Prediction

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: chemical mixtures, deep learning, molecular property prediction
Abstract: Developing improved predictive models for multi-molecular systems is crucial, as nearly every chemical product used results from a mixture of chemicals. While being a vital part of the industry pipeline, the chemical mixture space remains relatively unexplored by the Machine Learning (ML) community. In this paper, we introduce CheMixHub, a holistic benchmark for molecular mixtures spanning a corpus of 11 chemical mixtures property prediction tasks. With applications ranging from drug delivery formulations to battery electrolytes, CheMixHub currently totals approximately 500k data points gathered and curated from 7 publicly available datasets. We devise various data splitting techniques to assess context-specific generalization and model robustness, providing a foundation for the development of predictive models for chemical mixture properties. Furthermore, we map out the modelling space of deep learning models for chemical mixtures, establishing initial benchmarks for the community. This dataset has the potential to accelerate chemical mixture development, encompassing reformulation, optimization, and discovery. The dataset and code for the benchmarks can be found at: https://github.com/chemcognition-lab/chemixhub
Croissant File: zip
Dataset URL: https://github.com/chemcognition-lab/chemixhub
Code URL: https://github.com/chemcognition-lab/chemixhub
Supplementary Material: pdf
Primary Area: AL/ML Datasets & Benchmarks for physics (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 2062
Loading