Abstract: When examining the effects of a continuous variable x on an outcome y, a researcher might
choose to dichotomize on x, dividing the population into two sets—low x and high x—and
testing whether these two subpopulations differ with respect to y. Dichotomization has long
been known to incur a cost in statistical power, but there remain circumstances in which it is
appealing: an experimenter might use it to control for confounding covariates through subset
selection, by carefully choosing a subpopulation of Low and a corresponding subpopulation
of High that are balanced with respect to a list of control variables, and then comparing the
subpopulations’ y values. This “divide, select, and test” approach is used in many papers
throughout the psycholinguistics literature, and elsewhere. Here we show that, despite the
apparent innocuousness, these methodological choices can lead to erroneous results, in two
ways. First, if the balanced subsets of Low and High are selected in certain ways, it is possible
to conclude a relationship between x and y not present in the full population. Specifically, we
show that previously published conclusions drawn from this methodology—about the effect of a
particular lexical property on spoken-word recognition—do not in fact appear to hold. Second,
if the balanced subsets of Low and High are selected randomly, this methodology frequently
fails to show a relationship between x and y that is present in the full population. Our work
uncovers a new facet of an ongoing research effort: to identify and reveal the implicit freedoms
of experimental design that can lead to false conclusions
0 Replies
Loading