Automated Prototyping of Behavioral Experiments with Large Language Models

Alessandra Brondetta; Sebastian Musslick

Automated Prototyping of Behavioral Experiments with Large Language Models

Alessandra Brondetta, Sebastian Musslick

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

TL;DR: A closed-loop framework for in silico prototyping of behavioral experiments, instantiated with LLMs and demonstrated on the Wisconsin Card Sorting Test by discovering task framings that elicit perseverative responding in synthetic participants.

Abstract: Piloting behavioral experiments is a critical yet resource-intensive step in behavioral research. Scientists often rely on intuition and repeated data collection before arriving at experimental designs that elicit desired behavioral phenomena. To address this challenge, we introduce a closed-loop framework for in silico prototyping of behavioral experiments, in which an LLM-based AI scientist iteratively proposes experimental designs and revises them based on the behavior of participant LLMs. We formalize this as a black-box optimization problem in which the experimentalist minimizes a loss defined over behavioral metrics of interest — a formulation that admits any optimizer, any participant population, and any parameterizable experimental component. We illustrate this approach in the context of task framing, the narrative cover stories that introduce participants to experimental tasks. Using the Wisconsin Card Sorting Test, a canonical paradigm of cognitive flexibility, we show that the framework can discover framings that indirectly modulate perseverative responding in synthetic participants without explicit instruction to do so. Our findings highlight the potential of AI scientists to accelerate the design cycle in behavioral research, enabling cost-effective exploration of experimental design spaces prior to in vivo validation with human participants, and positioning such systems as practical tools on the path toward more autonomous discovery in the behavioral sciences.

Keywords: Large Language Models, Behavioral Experiments, Automated Scientific Discovery, In Silico Experimentation, Optimal Experimental Design

Submission Number: 296

Loading