Abstract: We present a method for performing zero-shot Dialogue State Tracking (DST) by casting the task as a learning-to-ask-questions framework. The framework learns to pair the best question generation (QG) strategy with in-domain question answering (QA) methods to extract slot values from a dialogue without any human intervention. A novel self-supervised QA pretraining step using in-domain data is essential to learn the structure without requiring any slot-filling annotations. Moreover, we show that QG methods need to be aligned with the same grammatical person used in the dialogue. Empirical evaluation on the MultiWOZ 2.1 dataset demonstrates that our approach, when used alongside robust QA models, outperforms existing zero-shot methods in the challenging task of zero-shot cross domain adaptation-given a comparable amount of domain knowledge during data creation. Finally, we analyze the impact of the types of questions used, and demonstrate that the algorithmic approach outperforms template-based question generation.
Loading