Abstract: Black-box language models (BLMs), large language models accessible only via an API, showcase remarkable (few shot) in-context learning performance for many NLP tasks. Our work explores their performance for end-to-end task-oriented dialog (TOD) systems, in the setting where a reasonable-sized training data is available. Benchmarking three BLMs (Google's text-bison and OpenAI's gpt-3.5-turbo and gpt-4) on two end-to-end TOD datasets (MultiWoZ and SMD), we find that their performance is not on par with existing supervised SoTA models. In response, we propose SincTOD, which synergizes trained models with BLMs for superior performance. At a high level, SincTOD uses supervised models to provide additional hints and exemplar selection for BLM's in-context prompts. We show that SincTOD with gpt-4 outperforms SoTA baselines on both datasets. Further, SincTOD also showcases strong performance in low-data setting, where it can be trained with less than 300 dialogs.
Paper Type: short
Research Area: Dialogue and Interactive Systems
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading