Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

ACL ARR 2025 February Submission4389 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Guesstimation, the task of making approximate quantity estimates of a physical object or an event, is a common real-world challenge. However, it has been largely overlooked in large language model (LLM) research. We introduce three guesstimation datasets, MARBLES, FUTURE, and ELECPRED. These datasets include guesstimation from concrete object estimation (e.g., how many marbles can fit in a one-cup measuring cup) to abstract scenario predictions such as predicting the 2024 U.S. presidential election result. Inspired by the social science concept of the ''Wisdom of Crowds'' (WOC)---taking the median from estimates from a crowd, which has proven effective in guesstimation, we propose the ``WOC decoding'' strategy for LLM guesstimation. We replicate prior findings that WOC improves human guesstimation accuracy and show that LLMs exhibit a similar WOC effect. The success of LLMs in guesstimation suggests they possess some level of a ''world model'' necessary for guesstimation. Moreover, the WOC decoding method improves LLM guesstimation accuracy more efficiently than other decoding methods, such as self-consistency. These results highlight the value of WOC decoding strategy for LLMs and position guesstimation as a probe for evaluating LLMs' world model. As LLMs’ world model is a fundamental prerequisite for many real-world tasks (e.g., forecasting and human-AI teaming), our findings have broad implications for the AI community.

Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: prompting
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 4389
Loading