Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Alon Goldstein; Miriam Havin; Roi Reichart; Ariel Goldstein

Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Alon Goldstein, Miriam Havin, Roi Reichart, Ariel Goldstein

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Language Modeling and Analysis of Language Models

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: Large Language Models, Problem-solving abilities, Stumpers, Cognitive abilities, Human performance, Riddles

TL;DR: This study examines Large Language Models' problem-solving capabilities on stumpers, revealing that they outperform humans at solving but lag behind in verifying solutions

Abstract: This paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs' cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.

Submission Number: 668

Loading