Track: Tiny paper track (up to 4 pages)
Abstract: Agentic AI research assistants, enabled by augmenting large language models with code-execution and tool-use abilities, promise to transform scientific workflows and accelerate biomedical research. In this study, we share preliminary results from our work in evaluating LLM agent capabilities in genomics. We design a simple bioinformatic research agent augmented with tool calls and code execution and instructed with a high-level task-agnostic system prompt. We implement this agent with three frontier-level LLMs: GPT-4o, o3-mini, and Claude 3.5 Sonnet, and compare their performance. We evaluate the performance of our agents in labeling cell types in clustered high-resolution transcriptomic data, a traditionally time-intensive task requiring both manual effort and domain expertise. Our agents are able to accurately complete this task, although performance fluctuates over multiple iterations due to hallucination. Overall, our results indicate that LLM agents are capable of autonomously planning and executing genomic analyses with only high-level direction. We are encouraged by these early results and look forward to extending these evaluations in future work.
Submission Number: 37
Loading