Curriculum Learning for Vision-and-Language Navigation

Jiwen Zhang; zhongyu wei; Jianqing Fan; Jiajie Peng

Curriculum Learning for Vision-and-Language Navigation

Jiwen Zhang, zhongyu wei, Jianqing Fan, Jiajie Peng

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Vision-and-language navigation, curriculum learning, multimodal learning

TL;DR: We propose to use an efficient curriculum learning method for vision-and-language navigation task.

Abstract: Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum- based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Code: https://github.com/IMNearth/Curriculum-Learning-For-VLN

11 Replies

Loading