BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

Xiangru Tang; Bill Qian; Rick Gao; Jiakang Chen; Xinyun Chen; Mark Gerstein

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Code Generation, Benchmark, Large Language Models, Dataset

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Bioinformatics provides an important domain. In this field generating functional programs poses additional notable challenges due to the amount of specialized domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. The results highlight two key aspects of successful models: 1) that they contain specific domain knowledge of bioinformatics (beyond just coding knowledge); 2) that they accommodate a long prompt with full context (i.e. functional dependencies).

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1271

Loading