Adversarial Policies Beat Professional-Level Go AIs

Tony Tong Wang; Adam Gleave; Nora Belrose; Tom Tseng; Michael D Dennis; Yawen Duan; Viktor Pogrebniak; Sergey Levine; Stuart Russell

Adversarial Policies Beat Professional-Level Go AIs

Tony Tong Wang, Adam Gleave, Nora Belrose, Tom Tseng, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

Published: 01 Feb 2023, Last Modified: 14 Jan 2026Submitted to ICLR 2023Readers: Everyone

Abstract: We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >80% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo---in fact, the adversary is easily beaten by human amateurs. Instead, the adversary wins by tricking KataGo into ending the game prematurely at a point that is favorable to the adversary. Our results demonstrate that even professional-level AI systems may harbor surprising failure modes.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/adversarial-policies-beat-professional-level/code)

18 Replies

Loading