BioSkillSafety: A Systematic Benchmark for Evaluating Agent Skill Safety in Bioinformatics

Bioclaw Team

BioSkillSafety: A Systematic Benchmark for Evaluating Agent Skill Safety in Bioinformatics

Bioclaw Team

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for research, AI for Biology, Agents, safety, openclaw, skills

TL;DR: We present BioSkillSafety, the first systematic framework for evaluating skill-based agent safety in bioinformatics domains.

Abstract: LLM agents have rapidly emerged as transformative tools for biomedical research, yet their safety risks in bioinformatics-specific contexts remain unexplored. We present **BioSkillSafety, the first systematic framework for evaluating skill-based agent safety in bioinformatics domains.** Our six-layer taxonomy achieves 100% coverage across 13 attack cases spanning genomics, transcriptomics, clinical, infrastructure, and external communication domains. Through 429 trials across 11 models and 3 real-world skill repositories, we reveal that all skill libraries exhibit consistent vulnerabilities, model safety varies significantly with backbone selection, and domain-specific patterns demand targeted safeguards. These findings establish standardized benchmarks for trustworthy deployment of biomedical AI agents, contributing to safer and more reliable AI-assisted biomedical research.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 22

Loading