Can LLMs Solve and Generate Linguistic Olympiad Puzzles?

Can LLMs Solve and Generate Linguistic Olympiad Puzzles?

ACL ARR 2025 February Submission2716 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) successfully recognize patterns in vast amounts of text data and use these patterns for various tasks, including reasoning and text generation. In this work, we investigate the application of LLMs (with and without reasoning capabilities) to different aspects of linguistic puzzle solving. We demonstrate that LLMs outperform humans in solving most linguistic puzzles related to several linguistic topics. However, for puzzles centered around understanding writing systems, LLMs perform worse than humans. We also present results from several experiments using LLMs for the novel task of linguistic puzzle generation. While LLMs show potential in generating interesting linguistic puzzles, this type of creative task remains beyond the current capabilities of even the most advanced LLMs.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking; multilingual corpora; automatic creation and evaluation of language resources; evaluation methodologies; evaluation; datasets for low resource languages; metrics;

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: Georgian, Greek, Gujarati, Spanish, and others

Submission Number: 2716

Loading