Generation with Dynamic Vocabulary

ACL ARR 2024 June Submission4076 Authors

16 Jun 2024 (modified: 13 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Vocabulary is a crucial component of language models. Traditional language models generate text by selecting tokens from a fixed vocabulary. In this paper, we introduce a novel dynamic setting for the vocabulary. Under this setting, vocabulary can include arbitrary text spans on demand. These text spans act as basic bricks, akin to tokens in the fixed vocabulary. Our proposed model can be deployed in a way of plug-and-play. Extensive experimental results demonstrate that our approach yields superior generation quality. For instance, compared to the standard language model, the MAUVE metric increases from 20.47 $\%$ to 25.69$\%$. We also demonstrate that dynamic vocabulary can be effectively applied to different domains in a training-free manner, and it also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).
Paper Type: Long
Research Area: Generation
Research Area Keywords: dynamic vocabulary, text generation
Languages Studied: English
Submission Number: 4076