Keywords: LLMs, mathematics, reasoning, evaluation
TL;DR: We show, contrary to the optimism about LLM's problem-solving abilities, that comparatively simple problems can exist that no LLM solves
Abstract: We show, contrary to the optimism about LLM's problem-solving abilities, fueled by the recent gold medals at the International Math Olympiad (IMO) that LLMs attained, that a problem exists---Yu Tsumura's 554th problem---that a) is within the scope of an IMO problem in terms of proof sophistication, b) is not a combinatorics problem, which have caused issues for LLMs, c) requires fewer proof techniques than typical hard IMO problems, d) has a publicly available solution (likely in the training data of LLMs), and e) that cannot be readily solved by \emph{any} existing off-the-shelf LLM (commercial or open-source). We include an analysis of the output traces of 16 SOTA LLMs. Additionally, we compare the generic LLM output to a new proof by a former IMO participant, carried out in a small study, which is significantly better motivated than the original, publicly-available proof, and elaborate on the differences in LLM and human proof quality.
Primary Area: datasets and benchmarks
Submission Number: 19861
Loading