Keywords: LLM application, hypothesis generation, computational social science
TL;DR: ExperiGen unifies hypothesis generation and validation, discovering novel, experimentally verified hypotheses with state-of-the-art predictive performance.
Abstract: Automating the scientific method of generating hypotheses has the potential to accelerate discovery across disciplines, especially in data-driven sciences such as psychology and behavioral research. At its core, the scientific method is a cycle of observation, hypothesis generation, and experimental validation. In doing so, however, we observe a fundamental dichotomy in existing methods: generation approaches propose hypotheses without experimental validation, while validation approaches are limited to structured tabular settings reducing the scope and impact of both. To address this gap, we introduce ExperiGen, a collaborative agentic framework that couples a Generator, which proposes natural language hypotheses, with an Experimenter, which programmatically constructs features, executes statistical tests, and returns evidence for iterative refinement. This coupling enables the discovery of hypotheses that are not only experimentally verified but also more predictive, while also addressing the bottleneck of relying solely on LLM in-context reasoning. As a result, ExperiGen extends hypothesis discovery beyond text to visual domains, including tasks such as image memorability and layout design preference (e.g., “designs with clear visual hierarchy are more aesthetic”). We evaluate on existing benchmarks for hypothesis generation achieving 5-8\% absolute gains over existing methods, while also producing substantially more statistically significant hypotheses. Finally, we conduct a large-scale industrial A/B test on a Fortune 500 company’s webpage, making ExperiGen the first method where AI-generated hypotheses yielded statistically significant improvements in a real-world field setting.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22215
Loading