TL;DR: We propose an algorithm to find a sub-structure of action sequence with natural language instruction, which is useful for training agents to solve the task.
Abstract: When mapping a natural language instruction to a sequence of actions, it is often useful to
identify sub-tasks in the instruction.
Such sub-task segmentation, however, is not necessarily provided in the training data.
We present the A2LCTC (Action-to-Language Connectionist Temporal Classification) algorithm to automatically discover a sub-task segmentation of an action sequence.
A2LCTC does not require annotations of correct sub-task segments and learns to find them from pairs of instruction and action sequence in a weakly-supervised manner.
We experiment with the ALFRED dataset and show that A2LCTC accurately finds the sub-task structures.
With the discovered sub-tasks segments, we also train agents that work on the downstream task and empirically show that our algorithm improves the performance.
Track: Archival (will appear in ACL workshop proceedings)
Acl Rolling Review: https://openreview.net/forum?id=Y2K3dtqsPW5
0 Replies
Loading