Abstract: In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.
Loading