Domain-Slot Relationship Modeling Using a Pre-Trained Language Encoder for Multi-Domain Dialogue State Tracking

Jinwon An, Sungzoon Cho, Junseong Bang, Misuk Kim

Published: 01 Jan 2022, Last Modified: 12 May 2023IEEE ACM Trans. Audio Speech Lang. Process. 2022Readers: Everyone

Abstract: Dialogue state tracking for multi-domain dialogues is challenging because the model should be able to track dialogue states across multiple domains and slots. As using pre-trained language models is the de facto standard for natural language processing tasks, many recent studies use them to encode the dialogue context for predicting the dialogue states. Model architectures that have certain inductive biases for modeling the relationship among different domain-slot pairs are also emerging. Our work is based on these research approaches on multi-domain dialogue state tracking. We propose a model architecture that effectively models the relationship among domain-slot pairs using a pre-trained language encoder. Inspired by the way the special <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$[CLS]$</tex-math></inline-formula> token in BERT is used to aggregate the information of the whole sequence, we use multiple special tokens for each domain-slot pair that encodes information corresponding to its domain and slot. The special tokens are run together with the dialogue context through the pre-trained language encoder, which effectively models the relationship among different domain-slot pairs. Our experimental results on the datasets MultiWOZ-2.0 and MultiWOZ-2.1 show that our model outperforms other models with the same setting. Our ablation studies incorporate three main parts. The first component shows the effectiveness of our approach exploiting the relationship modeling. The second component compares the effect of using different pre-trained language encoders. The final component involves comparing different initialization methods that could be used for the special tokens. Qualitative analysis of the attention map of the pre-trained language encoder shows that our special tokens encode relevant information through the encoding process by attending to each other.

0 Replies