Interactive Grounded Language Acquisition and Generalization in a 2D World


Nov 03, 2017 (modified: Dec 12, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We build a virtual agent for learning language in a 2D maze-like world. The agent sees surrounding environment images, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably extrapolates to interpret sentences that contain new word combinations or new words not appeared in training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods on interpreting zero-shot sentences. We additionally demonstrate human-interpretable intermediate outputs of the model.
  • TL;DR: Training an agent in a 2D virtual world for grounded language acquisition and generalization.
  • Keywords: grounded language learning and generalization, zero-shot language learning