HypoGenVision: A Multimodal AI Agent for Hypothesis Generation from Biological Microscopy Images

Agents4Science 2025 Conference Submission120 Authors

12 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI agents, Hypothesis generation, Multimodal learning, Biological microscopy, Scientific discovery, Plausibility and novelty, Responsible AI
TL;DR: This paper introduces HypoGenVision, a multimodal AI agent that generates structured, testable scientific hypotheses from biological microscopy images by integrating visual analysis with language-based reasoning.
Abstract: Scientific discovery fundamentally depends on the formulation of hypotheses, yet this critical step remains dominated by human intuition and serendipity. Current AI systems excel at summarization, classification, and prediction, but rarely contribute directly to the creative and generative act of hypothesis formation. We introduce HypoGenVision, the first multimodal AI agent designed to generate structured, testable scientific hypotheses by integrating microscopy image understanding with language-based reasoning. Unlike prior approaches restricted to text mining or descriptive image analysis, HypoGenVision jointly encodes visual and textual information, generates candidate hypotheses via a biomedical large language model, and ranks them with a plausibility–novelty–testability scoring function. Applied to two benchmark microscopy datasets, our system achieved expert-rated plausibility of 82% and significance of 78%, substantially outperforming strong baselines. We release all resources to ensure full reproducibility. This work demonstrates that multimodal AI agents can engage in one of the most creative aspects of science—hypothesis generation—and marks a step toward AI systems that not only analyze existing data but also help create new scientific knowledge.
Submission Number: 120
Loading