On Sampling Information Sets to Learn from Imperfect Information

Timo Bertram; Johannes Fürnkranz; Martin Müller

On Sampling Information Sets to Learn from Imperfect Information

Timo Bertram, Johannes Fürnkranz, Martin Müller

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Games, Imperfect Information, Neural Networks

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We investigate how the number of states sampled from an information set influences how well a learner is able to approximate the value of the information set.

Abstract: In many real-world decision-making scenarios, agents are confronted with incomplete and imperfect information, requiring them to make choices based on limited knowledge. Imperfect-information games tackle this challenge by organising different potential situations into so-called information sets, i.e. sets of possible world states that are indistinguishable from one observer's perspective, but directly evaluating an information set is difficult. A common but often suboptimal strategy is to evaluate the individual states in the set with a perfect information evaluator and combine the results. This not only presents problems related to translating perfect information evaluations to imperfect information settings but is also immensely costly in situations with extensive hidden information. This work focuses on learning direct evaluators for information sets by assessing only a subset of the states in the information set, thereby reducing the overall cost of evaluation. Critically, we focus on one question: How many states should be sampled from a given information set? This involves a trade-off between the cost of computing a training signal and its accuracy. We present experimental results in three settings: an artificial MNIST variant with hidden information, Heads-Up Poker, and Reconnaissance Blind Chess. Our results show that the number of sampled states significantly influences the efficiency of training neural networks. However, there are diminishing returns when sampling a large number of states. Notably, in the three regarded domains, using one, two and two samples respectively leads to the best performance concerning the total number of evaluations required. This research contributes to the understanding of how to optimise the sampling of information sets in scenarios of incomplete information, thus offering practical insight into the balance between computational cost and accuracy.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8028

Loading