Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

Yue Wu; Yewen Fan; Paul Pu Liang; Amos Azaria; Yuanzhi Li; Tom Mitchell

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

Yue Wu, Yewen Fan, Paul Pu Liang, Amos Azaria, Yuanzhi Li, Tom Mitchell

Published: 03 Mar 2023, Last Modified: 20 Apr 2023RRL 2023 OralReaders: Everyone

Keywords: Reinforcement Learning, Instruction Manual, Atari Games, Large Language Models, Language Models, Zero-shot, Few-shot, In-context prompting

TL;DR: We propose the Read and Reward framework that speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers.

Abstract: High sample complexity has long been a challenge for RL. On the other hand, human learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. Auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. When assisted by our design, A2C improves on 4 games in the Atari environment with sparse rewards, and requires 1000x less training frames compared to the previous SOTA Agent 57 on Skiing, the hardest game in Atari.

Track: Technical Paper

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

2 Replies

Loading