Keywords: Large Language Models, LLMs, Wisdom of Crowds, Ensemble Methods, Collective Intelligence, Agent-Based Modeling, Vision-Language Models
TL;DR: This paper finds that for vision-based estimation tasks, combining diverse models' deterministic outputs produces a "wisdom of crowds"-like effect.
Abstract: The "wisdom of crowds" phenomenon shows that aggregating independent estimates can yield more accurate predictions than individual guesses. While crowd-sourcing is widely applied, using large language models (LLMs) for collective estimation is largely unexplored. This work investigates how to best form an LLM "crowd" for ambiguous vision-based estimation tasks. We explore two sources of diversity: response diversity, from sampling at various temperatures, and model diversity, from using different LLM architectures. We evaluate these approaches on three vision-based datasets: human height-weight pairs, small objects with known weights, and Amazon products with their prices. Our results show that aggregating deterministic (temperature 0) outputs from a diverse set of models is the most effective strategy, outperforming any single model and ensembles that rely on stochasticity from higher temperatures. We find that temperature-induced diversity introduces more noise than signal. The median aggregation of deterministic responses from multiple models outperformed 67% of individual guesses on average, a figure that rises to 75% when relevant context is provided, demonstrating that model diversity is the key to leveraging the wisdom of LLM crowds. By establishing core principles for forming an effective LLM crowd, this work provides a stepping stone for more complex, LLM-driven social simulations.
Submission Number: 25
Loading