Abstract: Large Language Models (LLMs) successfully recognize patterns in vast amounts of text data and use these patterns for various tasks, including reasoning and text generation. In this work, we investigate the application of LLMs (with and without reasoning capabilities) to different aspects of linguistic puzzle solving. We demonstrate that LLMs outperform humans in solving most linguistic puzzles related to several linguistic topics. However, for puzzles centered around understanding writing systems, LLMs perform worse than humans. We also present results from several experiments using LLMs for the novel task of linguistic puzzle generation. While LLMs show potential in generating interesting linguistic puzzles, this type of creative task remains beyond the current capabilities of even the most advanced LLMs.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking; multilingual corpora; automatic creation and evaluation of language resources; evaluation methodologies; evaluation; datasets for low resource languages; metrics;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: Georgian, Greek, Gujarati, Spanish, and others
Submission Number: 2716
Loading