Keywords: Scientific reasoning, Naïve scientific theories, Cognitive development, Persona prompting, Piagetian stages, CHILDES, Human-AI alignment
TL;DR: This study tests GPT-4o’s scientific reasoning capabilities, finding that while it doesn’t match adult reasoning, prompting with developmental personas, especially Piaget’s pre-operational stage, elicits human-like patterns and developmental trends.
Abstract: Cognitive scientists are increasingly exploring Large Language Models (LLMs) as models of human reasoning, including analogical and fluid reasoning. Here, we investigate whether GPT-4o replicates key patterns of human scientific reasoning performance. One such pattern in humans, revealed by developmental research, is that the naïve scientific theories of childhood are not entirely supplanted by the normative scientific theories learned later in school. Instead, the two co-exist, and when they make inconsistent predictions, adults actively suppress the naïve theory, leading to slower and less accurate responses. This motivates our first question: Does GPT-4o exhibit similar interference when normative and naïve theories conflict? Experiment 1 tested this using a baseline task prompt when establishing the persona of a college student. The model failed to replicate the human pattern of poorer performance on statements where naïve and normative theories conflict. To explore whether developmental cues could produce more human-like reasoning, Experiment 2 asked whether GPT-4o can model the developmental trajectory of scientific reasoning, instantiating personas of children of different ages using two approaches: textbook descriptions of Piagetian stages and transcripts of child-directed speech from the CHILDES databank. The textbook-defined personas showed encouraging results: earlier stages showed greater difficulty with inconsistent statements, while later stages exhibited the expected developmental improvement. In humans, the influence of naïve theories is attributed to the cost of their top-down suppression during reasoning. For LLMs, we propose that performance is instead shaped bottom-up by the retrieval context, specifically, the prompt or persona in the model’s context window. Future research on LLMs as cognitive models may benefit from focusing on how contextual framing shapes reasoning behavior.
Paper Track: Technical paper
Submission Number: 10
Loading