The Efficacy of Pre-training in Chemical Graph Out-of-distribution Generalization

Published: 17 Jun 2024, Last Modified: 24 Jul 2024ICML2024-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Neural Networks, Out-Of-Distribution Generalization, Self-Supervised Pre-training
TL;DR: We introduce a benchmark for assessing pre-trained models in chemical graph out-of-distribution (OOD) scenarios. Our findings reveal that employing pre-trained models yields comparable results to OOD-tailored methods.
Abstract: Graph neural networks have shown significant progress in various tasks, yet their ability to generalize in out-of-distribution (OOD) scenarios remains an open question. In this study, we conduct a comprehensive benchmarking of the efficacy of chemical graph pre-trained models in the context of OOD challenges, named as PODGenGraph. We conduct extensive experiments across diverse chemical graph datasets, encompassing different graph sizes. Our benchmark is framed around distinct distribution shifts, including both concept and covariate shifts, whilst also varying the degree of shift. Our findings are striking: even basic pre-trained models exhibit performance that is not only comparable to, but often surpasses, specifically designed models to handle distribution shift. We further investigate the results, examining the influence of the key factors (e.g., sample size, learning rates, in-distribution performance etc.) of pre-trained models for OOD generalization. In general, our work shows that pre-training could be a flexible and simple approach to OOD generalization in chemical graph learning. Leveraging pre-trained models together for chemical graph OOD generalization in real-world applications stands as a promising avenue for future research.
Submission Number: 234
Loading