AdaptFSP: Adaptive Fictitious Self PlayDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Deep reinforcement learning, game theory, exploitability
TL;DR: Use deep rl to modify FSP for better performance in continuous control games
Abstract: Fictitious Self-Play (FSP) is an iterative algorithm capable of learning approximate Nash equilibria in many types of two-player zero-sum games. In FSP, at each iteration, a best response is learned to the opponent's meta strategy. However, FSP can be slow to converge in continuous control games in which two embodied agents compete against one another. We propose Adaptive FSP (AdaptFSP), a deep reinforcement learning (RL) algorithm inspired by FSP. The main idea is that instead of training a best response only against the meta strategy, we additionally train against an adaptive deep RL agent that can adapt to the best response. In four test domains, two tabular cases--random normal-form matrix games, Leduc poker--and two continuous control tasks--Thou Shall Not Pass and a soccer environment--we show that AdaptFSP achieves lower exploitability more quickly than vanilla FSP.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
5 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview