Abstract: The classification of organic reactions is a complex and tedious process, and it often requires certain domain knowledge to understand the classification rules. To reduce such requirements on domain knowledge, BERT-based deep learning methods have been applied to classify reactions based on SMILES strings. However, the same reaction can be represented by different but equivalent SMILES strings, and it can be observed that BERT-based methods are highly sensitive to the choice of SMILES strings. Logically, GNN-based methods are robust to equivalent SMILES strings. Here, we propose a graph isomorphism network (GIN) with contrastive learning, called ContraGIN, to encourage feature fusion learning between precursors and products of the reaction. Indeed, experiments have shown that the new method focuses on learning the atoms near the bond changes before and after the reaction, and consequently achieves a classification accuracy of 99.30% on the USPTO 1k TPL dataset. Additional experiments have also shown that ContraGIN is more robust and much faster for lager reactions (with more atoms) and complicated reactions (with rings).
Loading