Identifying Neglected Hypotheses in Neurodegenerative Disease with Large Language Models

Published: 27 Oct 2023, Last Modified: 21 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: large language models, structured information extraction, scientific hypothesis formulation, neglected hypotheses
TL;DR: We piloted a largely automated method, leveraging LLMs, high-dimensional embeddings, and dimensionality reduction techniques to surface neglected scientific hypotheses in the neurodegenerative disease.
Abstract: Neurodegenerative diseases remain a medical challenge, with existing treatments for many such diseases yielding limited benefits. Yet, research into diseases like Alzheimer's often focuses on a narrow set of hypotheses, potentially overlooking promising research avenues. We devised a workflow to curate scientific publications, extract central hypotheses using gpt3.5-turbo, convert these hypotheses into high-dimensional vectors, and cluster them hierarchically. Employing a secondary agglomerative clustering on the "noise" subset, followed by GPT-4 analysis, we identified signals of neglected hypotheses. This methodology unveiled several notable neglected hypotheses including treatment with coenzyme Q10, CPAP treatment to slow cognitive decline, and lithium treatment in Alzheimer's. We believe this methodology offers a novel and scalable approach to identifying overlooked hypotheses and broadening the neurodegenerative disease research landscape.
Submission Number: 46
Loading