Abstract: The remarkable capabilities of Large Language Models (LLMs) in text generation have been widely recognized. However, their inefficiency in generating text at the token level leaves room for improvement, and adapting these models to new data remains a challenging task. To tackle these challenges, we introduce a novel approach to language modeling -- Chunk-Distilled Language Modeling (CD-LM). By integrating deep neural networks with a straightforward retrieval module, our method allows the generation of text chunks containing fine-grained information through multiple tokens at a single decoding step. Our retrieval framework enables flexible construction of model- or domain-specific datastores, either leveraging the internal knowledge of pre-trained or fine-tuned models, or incorporating expert insights from human-annotated corpus. This adaptability allows for enhanced control over language model distribution without necessitating additional training. We present a formal formulation of our {\name} framework, along with quantifiable performance metrics, demonstrating its efficacy in optimizing language model performance and efficiency across a diverse set of downstream tasks, including language modeling, text generation, and domain adaptation.
Paper Type: Long
Research Area: Generation
Research Area Keywords: retrieval-augmented generation, domain adaptation, inference methods, retrieval-augmented models
Languages Studied: English
Submission Number: 5606
Loading