Large Language Models as Rational Players in Competitive Economics Games

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: large language models, evaluation, economics, agents, game thoery
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose to use competitive economics games to evaluation the rationality degree, strategic reasoning ability, and instruction-following ability of agents based on large language models.
Abstract: Large language models (LLMs) have been extensively used as the backbones for general-purpose agents, and some economics literature suggest that LLMs are capable of playing various types of economics games. Following these works, to overcome the limitation of evaluating LLMs using static benchmarks, we propose to explore competitive games as an evaluation for the rationality and strategic reasoning ability of LLMs. By varying the game history revealed to LLMs-based players, we find that most of LLMs are rational in the sense of playing strategies that can increase their payoffs, but not the most rational strategies, i.e. Nash Equilibria (NEs). Moreover, when game history are available, certain types of LLMs, such as GPT-4, can converge faster to the NE strategies, which shows a higher level of rationality compared to other models. In the meantime, certain types of LLMs can win more often when game history are available, and we argue that the winning rate reflects the reasoning ability with respect to the strategies of other players. Throughout all our experiments, we observe that the ability to strictly follow the game rules described by natural languages also vary among the LLMs we tested. We provide an economics arena for the LLMs research community as a dynamic benchmark to test the above mentioned abilities of LLMs, i.e. rationality, strategic reasoning ability, and instruction-following capability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2817
Loading