A Framework for Toxic PFAS Replacement based on GFlowNet and Chemical Foundation Model

Published: 28 Oct 2023, Last Modified: 03 Dec 2023NeurIPS2023-AI4Science PosterEveryoneRevisionsBibTeX
Keywords: Molecule Generation, Toxic PFAS replacement, Foundation Model, GFlowNet
TL;DR: Introducing MatGFN-PFAS, an AI system based on GFlowNets and Chemical Foundation Models, designed to create safer substitutes for PFAS.
Abstract: Per- and polyfluoroalkyl substances (PFAS) are a broad class of molecules used in almost every sector of industry and consumer goods. PFAS exhibit highly desirable properties such as high durability, water repellance or high acidity, that are difficult to match. As a side effect, PFAS persist in the environment and have detrimental effect on human health. Epidemiological research has linked PFAS exposure to chronic health conditions, including dyslipidemia, cardiometabolic disorders, liver damage, and hypercholesterolemia. Recently, public health agencies significantly strengthed regulations on the use of PFAS. Therefore, alternatives are needed to maintain the pace of technological developments in multiple areas that traditionally relied on PFAS. To support the discovery of alternatives, we introduce MatGFN-PFAS, an AI system that generates PFAS replacements. We build MatGFN-PFAS using Generative Flow Networks (GFlowNets) for generation and a Chemical Language Model (MolFormer) for property prediction. We evaluate MatGFN-PFAS by exploring potential replacements of PFAS superacids, defined as molecules with negative pKa, that are critical for the semiconductor industry. It might be challenging to eliminate PFAS superacids entirely as a class due to the strong constraints on their functional performance. The proposed approach aims to account for this possibility and enables the generation of safer PFAS superacids as well. We evaluate two design strategies: 1) Using Tversky similarity to design molecules similar to a target PFAS and 2) Directly generating molecules with negative pKa and low toxicity. In this paper, we studied 6 PFAS molecules that have the structure defined as $R-CF_{2}OCF_{2}-R'$. For the given query PFAS SMILE $CC1CC(CC(F)(F)C(F)(F)OC(F)(F)C(F)(F)S(=O)(=O)O)OC1=O$, MatGFN-PFAS system was able to generate a candidate with very low toxicity, $LD50 = 7304.23$, strong acidity, $pKa = -1.92$, and high similarity score, $89.32 \%$, to the studied PFAS molecule. Results demonstrated that the proposed MatGFN-PFAS was able to consistently generate replacement molecules following all the constraints forehead mentioned. The resulting datasets for this ongoing study are available at https://ibm.box.com/v/MatGFN-PFAS-generated-datasets.
Submission Track: Original Research
Submission Number: 70