Using machine-learning and large-language-model extracted data to predict copolymerizations

Published: 08 Jul 2024, Last Modified: 23 Jul 2024AI4Mat-Vienna-2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Short Paper
Submission Category: AI-Guided Design
Keywords: data extraction, copolymerization, LLMs, reaction prediction
Abstract: Predicting the outcome of chemical reactions using machine-learning approaches can significantly enhance research in chemistry and materials science. The synthesis of polymers, for instance, depends heavily on reaction conditions such as temperature and solvent, making it challenging to predict products with only monomer information. In this work, we address this challenge by compiling the first comprehensive copolymerization dataset, including reaction conditions, consisting of 1138 reactions involving 347 unique monomers. We employed vision language model to extract data from 361 scientific articles, overcoming the limitations of traditional visual document understanding tools. In addition, we developed a novel data-driven filtering approach to further improve performance. Using this data, we built the first predictive models for copolymer reactivity that can predict whether a given reaction system favors homopolymerization. Our work showcases how advances in machine learning, in particular large-language-models, make it possible to address complex problems by creating bespoke datasets in a very flexible and scalable fashion.
Submission Number: 3
Loading