everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Guesstimation, the task of making approximate quantity estimates of a physical object or an event, is a common real-world challenge. However, it has been largely overlooked in large language model (LLM) research. We introduce three guesstimation datasets, MARBLES, FUTURE, and ELECPRED. These datasets include guesstimation from concrete object estimation (e.g., how many marbles can fit in a one-cup measuring cup) to abstract scenario predictions such as predicting the 2024 U.S. presidential election result. Inspired by the social science concept of the ''Wisdom of Crowds'' (WOC)---taking the median from estimates from a crowd, which has proven effective in guesstimation, we propose the ``WOC decoding'' strategy for LLM guesstimation. We replicate prior findings that WOC improves human guesstimation accuracy and show that LLMs exhibit a similar WOC effect. The success of LLMs in guesstimation suggests they possess some level of a ''world model'' necessary for guesstimation. Moreover, the WOC decoding method improves LLM guesstimation accuracy more efficiently than other decoding methods, such as self-consistency. These results highlight the value of WOC decoding strategy for LLMs and position guesstimation as a probe for evaluating LLMs' world model. As LLMs’ world model is a fundamental prerequisite for many real-world tasks (e.g., forecasting and human-AI teaming), our findings have broad implications for the AI community.