Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: laboratory procedure understanding, video benchmark, domain adaptation, V-JEPA, self-supervised learning, vision-language models, AI for science, autonomous laboratory perception
TL;DR: LabProc (6-task laboratory video benchmark) and Tacit (domain-adapted V-JEPA-2.1) reveal a structural-axis gradient: Claude Opus leads on language-amenable tasks by 41 points but loses to the 1000x smaller encoder by 9.1 points on motion-only.
Abstract: Autonomous laboratory systems direct robotic platforms and execute multi-step procedures, but their perception layer is typically a frontier vision-language model (VLM) queried with sampled frames. Whether VLMs are the appropriate perception substrate for laboratory video is an open empirical question. We introduce LabProc, a benchmark for laboratory procedure understanding that organizes six tasks along a structural axis from single-frame state recognition to triplet anchor matching where all clips share the same nominal physical state, and Tacit, a 300M-parameter domain-adapted V-JEPA-2.1 video encoder we release as a vision-only baseline. Across this axis we observe a structural-axis gradient: Claude Opus leads Tacit by 41 points on single-frame state classification and by 6-16 points on tasks that retain text-amenable structure, but on the motion-only TED-Visual Strict Hard subset the gradient reverses and Tacit leads Claude by 9.1 points (66.7% vs. 57.6%) despite a ~1000x parameter asymmetry. Tacit's continued pretraining required 28 minutes on a single H100 ($1.30 in compute) and improves over base V-JEPA-2.1 on every task with a mean uplift of +8.2 points. We also identify a representational tension central to laboratory self-supervised learning: state-invariance objectives (EMA target encoders combined with motion-conditioned masking) collapse within-state temporal physics, attenuating fine-grained motion-progression signals (Same-State CCR Kendall's tau drops from +0.048 at base to -0.062 at the released checkpoint). The released v1 dataset and Tacit checkpoint serve as a calibration target for future laboratory perception modules.
Submission Number: 319
Loading