Generation with Dynamic Vocabulary

ACL ARR 2024 April Submission854 Authors

16 Apr 2024 (modified: 02 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Vocabulary is a crucial component of language models. Traditional language models generate text by selecting tokens from a fixed vocabulary. In this paper, we introduce a novel dynamic setting for the vocabulary. Under this setting, vocabulary is able to include arbitrary text spans on demand, referred to as phrases in this paper. These phrases act as basic bricks, akin to tokens in the fixed vocabulary. Our proposed model can be deployed in a way of plug-and-play. Extensive experimental results demonstrate that our approach yields superior generation quality. For instance, compared to the standard language model, the MAUVE metric increases from 20.47% to 25.69%.We also demonstrate that our method can be effectively applied to specific domains in a training-free manner and excels in the citation generation task, substantially enhancing citation results without compromising answer accuracy.
Paper Type: Long
Research Area: Generation
Research Area Keywords: dynamic vocabulary, text generation,
Languages Studied: English
Submission Number: 854
Loading