Abstract: Data-driven molecular computing has become an increasingly popular topic in AI for molecular and bioinformatic science. Current molecular modeling exploits the Graph Neural Networks (GNNs) to achieve the representation but they mostly fail to generalize to out-of-distribution (OOD) samples. Even though recent advances on OOD-oriented graph learning discovered the invariant rationale on graphs, they still ignore two important issues, i.e., 1) the increasing number and types of molecules expand patterns of environments on graphs, resulting in failures of invariant rationale based models, 2) the associations between discovered molecular subgraphs and corresponding properties are complex where causal substructures cannot fully interpret the labels. To this end, we propose an environment causal learning framework, EMoNet, to tackle the unresolved OOD challenge in molecular science. Specifically, we model the graph environments via bypassing invariant subgraphs. We first incorporate chemistry principle into our graph growth generator to imitate environment growth, and then devise an environment-GIB to squash out environment and finally introduce a cross-attention causal aggregation, allowing dynamic interactions between environments and invariances. We perform experiments on seven datasets and extensive experiments demonstrate strong generalization ability of EMoNet.
Loading