Abstract: We are working towards AI planning systems with natural language interfaces. In this paper, we tackle the semantic parsing problem of learning to set the logical goals of the planning system based on a natural language description of the task. The current state of the art in semantic parsing is to use supervised learning with deep neural networks but this needs a lot of labelled data made by domain experts. To reduce this need, we additionally use a reward signal that comes from completing the AI planning task. We formalize this as a constrained combinatorial contextual bandit problem. The context is created by using a deep neural network for feature extraction and the constrained combinatorial nature of the task can be used to increase the efficiency of learning. We show this result theoretically with our lower regret bound and then experimentally in our extension of the TextWorld problem.
Paper Type: long
0 Replies
Loading