Keywords: monoculture, multiplicity, large language models
TL;DR: We systematically evaluate the concerns of multiplicity and monoculture in a suite of large language models and prediction tasks.
Abstract: Two narratives about machine learning ecosystems grew out of recent algorithmic fairness discourse. In one, dubbed \emph{monoculture}, algorithmic ecosystems tend toward homogeneity akin to a single model making all decisions. Individuals then face the risk of systematic exclusion with no recourse. In the other, \emph{model multiplicity}, many models solve the same task with similar accuracy, causing excessive variation in outcomes. Both narratives are compelling, yet, seemingly at odds: model multiplicity can’t exist in a strict monoculture. In this work, we conduct a comprehensive empirical evaluation to test both claims. We work from the premise that increasingly decision makers will use large language models for consequential prediction tasks. We therefore examine 50 language models, open source models ranging in size from 1B to 141B parameters and state-of-the-art commercial models, under 4 different prompt variations, and across 6 different prediction tasks. Evaluating both new and old quantitative measures of monoculture and multiplicity, we find the empirical landscape sits between the two extremes. Each narrative finds some empirical support, but neither is dominant. Systematic exclusion with no recourse is rare, but model similarity is real. Even when starting from a single model, prompt variation induces some diversity in predictions. Our results contribute critical empirical grounding to ongoing debates and point toward a middle ground between monoculture and multiplicity as the most realistic outcome.
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 28266
Loading