BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

ACL ARR 2026 January Submission4413 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Biomedical NLP, Tool-Calling, Dataset Construction
Abstract: Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain largely rely on in-context learning and restrict models to a small set of tools. To address this gap, we introduce BIOTOOL, a comprehensive biomedical tool-calling dataset designed for fine-tuning LLMs. BIOTOOL comprises 34 frequently used tools collected from the NCBI, Ensembl, and UniProt databases, along with 7,040 high-quality, human-verified query–API call pairs spanning variation, genomics, proteomics, evolution, and general biology. Fine-tuning a 4-billion-parameter LLM on BIOTOOL yields substantial improvements in biomedical tool-calling performance, outperforming state-of-the-art commercial LLMs such as GPT-5.1. Furthermore, human expert evaluations demonstrate that integrating a BIOTOOL-fine-tuned tool caller significantly improves downstream answer quality compared to the same LLM without tool usage, highlighting the effectiveness of BIOTOOL in enhancing the biomedical capabilities of LLMs.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: corpus creation, language resources, NLP datasets
Contribution Types: Data resources
Languages Studied: English
Submission Number: 4413
Loading