ChatPathway: Conversational Large Language Models for Biology Pathway Detection

Published: 28 Oct 2023, Last Modified: 28 Oct 2023NeurIPS2023-AI4Science PosterEveryoneRevisionsBibTeX
Keywords: Large Language Model, Biology Pathway, ChatGPT, Galactica
Abstract: Biological pathways, like protein-protein interactions and metabolic networks, are vital for understanding diseases and drug development. Some databases such as KEGG are designed to store and map these pathways. However, many bioinformatics methods face limitations due to database constraints, and certain deep learning models struggle with the complexities of biochemical reactions involving large molecules and diverse enzymes. Importantly, the thorough exploration of biological pathways demands a deep understanding of scientific literature and past research. Despite this, recent advancements in Large Language Models (LLMs), especially ChatGPT, show promise. We first restructured data from KEGG and augmented it with molecule structural and functional information sourced from UniProt and PubChem. Our study evaluated LLMs, particularly GPT-3.5-turbo and Galactica, in predicting biochemical reactions and pathways using our constructed data. We also assessed its ability to predict novel pathways, not covered in its training dataset, using findings from recently published studies. While GPT demonstrated strengths in pathway mapping, Galactica encountered challenges. This research emphasizes the potential of merging LLMs with biology, suggesting a harmonious blend of human expertise and AI in decoding biological systems.
Submission Track: Original Research
Submission Number: 176