Keywords: large language model, protein, sequence prediction, supervised finetuning, in-context learning
TL;DR: We test supervised finetuning and in-context learning ability of general-purpose LLMs on TAPE protein prediction benchmark.
Abstract: Recent years have witnessed the revolution sparked by Large Language Models (LLMs) in almost every AI-related field, and bioinformatics is no exception. While bioinfo LLMs boost the performance on many tasks such as protein structure prediction and DNA generation, three large gaps still exist between the bioinfo LLMs and LLMs in its mainstream community: generalizability (diversity of prior knowledge and target tasks), scalability (model sizes), and flexibility (In-Context Learning (ICL) learning paradigm). In this work, we aim to level the gap by applying supervised finetuning and in-context learning upon general-purpose LLMs for bioinformatics tasks. Experiment results on TAPE benchmark suggest that wider prior knowledge does not help bioinfo performance yet, and in-context learning for bioinfo tasks is generally still too hard; however, scalability indeed matters.
Submission Number: 11
Loading