- Keywords: Neural Networks, Planning, Reinforcement Learning, Structured Prediction, WikiTableQuestions
- TL;DR: A model-based planning component improves RL-based semantic parsing on WikiTableQuestions.
- Abstract: We consider the problem of weakly supervised structured prediction (SP) with reinforcement learning (RL) – for example, given a database table and a question, perform a sequence of computation actions on the table, which generates a response and receives a binary success-failure reward. This line of research has been successful by leveraging RL to directly optimizes the desired metrics of the SP tasks – for example, the accuracy in question answering or BLEU score in machine translation. However, different from the common RL settings, the environment dynamics is deterministic in SP, which hasn’t been fully utilized by the model-freeRL methods that are usually applied. Since SP models usually have full access to the environment dynamics, we propose to apply model-based RL methods, which rely on planning as a primary model component. We demonstrate the effectiveness of planning-based SP with a Neural Program Planner (NPP), which, given a set of candidate programs from a pretrained search policy, decides which program is the most promising considering all the information generated from executing these programs. We evaluate NPP on weakly supervised program synthesis from natural language(semantic parsing) by stacked learning a planning module based on pretrained search policies. On the WIKITABLEQUESTIONS benchmark, NPP achieves a new state-of-the-art of 47.2% accuracy.