Unveiling Zero Shot Prediction for Gene Attributes Through Interpretable AI

Published: 04 Mar 2024, Last Modified: 24 Apr 2024MLGenX 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: zero shot prediction, contrastive learning LLM, gene representations, embeddings, Gene Ontology, BERT
TL;DR: zero shot prediction with contrastive learning enhanced gene embeddings to predict gene attributes and cell properties using gene summaries.
Abstract: Representation learning has transformed the prediction of structures and functions of genes and proteins by employing sequence, expression, and network data. Yet, this approach taps into just a fraction of the knowledge accumulated over more than a century of genetic research. Here, we introduce GeneLLM, an interpretable transformer-based model that integrates textual information through contrastive learning to refine gene representations. While it has been posited that such knowledge representation could result in a bias towards well-characterized genes, GeneLLM surprisingly shows high accuracy across eight gene-related benchmarks, not only matching but often outperforming task-specific models, with a 50\% increase in accuracy over its closest solubility-specific competitor. It demonstrates robust zero-shot learning capabilities for unseen gene annotations. The model's interpretability and our multimodal strategic approach to mitigating inherent data biases bolster its utility and reliability, particularly in biomedical applications where interpretability is paramount. Our findings affirm the complementary nature of unstructured text to structured databases in enhancing biomedical predictions, while conscientiously addressing interpretability and bias for AI deployment in healthcare. The code and datasets can be found at https://www.avisahuai.com/tools on request
Submission Number: 48
Loading