Comparing the reliability of individual differences for various measurement models in conflict tasks

Michelle C. Donzallaz, Udo Boehm, Andrew Heathcote, Chris Donkin, Dora Matzke, Julia M. Haaf

Published: 01 Jan 2026, Last Modified: 12 Feb 2026Psychonomic Bulletin & ReviewEveryoneRevisionsCC BY-SA 4.0

Abstract: There is a growing realization that experimental tasks that produce reliable effects in group comparisons can simultaneously provide unreliable assessments of individual differences. Proposed solutions to this “reliability paradox” range from collecting more test trials to modifying the tasks and/or the way in which effects are measured from these tasks. Here, we systematically compare two proposed modeling solutions in a cognitive conflict task. Using the ratio of individual variability of the conflict effect (i.e., signal) and the trial-by-trial variation in the data (i.e., noise) obtained from Bayesian hierarchical modeling, we examine whether improving statistical modeling may improve the reliability of individual differences assessment in four Stroop datasets. The proposed improvements are (1) increasing the descriptive adequacy of the statistical models from which conflict effects are derived, and (2) using psychologically motivated measures from cognitive measurement models. Our results show that the type of model does not have a consistent effect on the signal-to-noise ratio: the proposed solutions improved reliability in only one of the four datasets. We provide analytical and simulation-based approaches to compute the signal-to-noise ratio for a range of models of varying sophistication and discuss their potential to aid in developing and comparing new measurement solutions to the reliability paradox.

External IDs:doi:10.3758/s13423-025-02801-7