Limitations in Planning Ability in AlphaZero

Published: 10 Oct 2024, Last Modified: 09 Nov 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AlphaZero, Deep Reinforcement Learning, Computational Cognitive Science
TL;DR: This paper explores AlphaZero's planning limitations in 4-in-a-row, revealing a 93% failure in complex puzzles and showing a 15% improvement in accuracy when integrating human-inspired cognitive value function..
Abstract: AlphaZero, a deep reinforcement learning algorithm, has achieved superhuman performance in complex games like Chess and Go. However, its strategic planning ability beyond winning games remains unclear. We investigated this using 4-in-a-row, a game used to study human planning. We analyzed AlphaZero's feature learning and puzzle-solving abilities. Despite strong gameplay, AlphaZero exhibited a 93% failure rate in puzzles. Our feature analysis showed that its self-learned strategies during training lacked certain critical human-like features. We added human-inspired cognitive value function into its policy and value outputs, leading to a 15% improvement in puzzle-solving accuracy. Our findings highlight the potential for human insights to enhance AI's strategic planning beyond self-play.
Submission Number: 45
Loading