ConnectomeBench: Can LLMs proofread the connectome?

Jeff Brown; Andrew Kirjner; Annika Vivekananthan; Edward Boyden

ConnectomeBench: Can LLMs proofread the connectome?

Jeff Brown, Andrew Kirjner, Annika Vivekananthan, Edward Boyden

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: connectomics, proofreading, multimodal, LLM

Abstract: Connectomics—the mapping of neural connections in an organism's brain—currently requires extraordinary human effort to proofread the data collected from imaging and machine-learning assisted segmentation. With the growing excitement around using AI agents to automate important scientific tasks, we explore whether current AI systems can perform multiple tasks necessary for data proofreading. We introduce ConnectomeBench, a multimodal benchmark evaluating large language model (LLM) capabilities in three critical proofreading tasks: segment type identification, split error correction, and merge error detection. Using expert annotated data from two large open-source datasets—a cubic millimeter of mouse visual cortex and the complete Drosophila brain—we evaluate proprietary multimodal LLMs including Claude 3.7/4 Sonnet, o4-mini, GPT-4.1, GPT-4o, as well as open source models like InternVL-3 and NVLM. Our results demonstrate that current models achieve surprisingly high performance in segment identification (52-82\% balanced accuracy vs. 20-25\% chance) and binary/multiple choice split error correction (75-85\% accuracy vs. 50\% chance) while generally struggling on merge error identification tasks. Overall, while the best models still lag behind expert performance, they demonstrate promising capabilities that could eventually enable them to augment and potentially replace human proofreading in connectomics.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/jeffbbrown2/ConnectomeBench

Code URL: https://github.com/jffbrwn2/ConnectomeBench

Primary Area: AL/ML Datasets & Benchmarks for life sciences (e.g. climate, health, life sciences, physics, social sciences)

Submission Number: 2098

Loading