What Makes Attention Distillation Work? An Exploration of Attention Distillation in Retrieval-Based Language Model

Anonymous

What Makes Attention Distillation Work? An Exploration of Attention Distillation in Retrieval-Based Language Model

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: Retrieval-based language models address the limitations of large language models by enabling real-time knowledge updates for more accurate answers. An efficient way in the training phase of retrieval-based models is attention distillation, which uses attention scores as a supervisory signal instead of manually annotated query-document pairs. Despite its growing popularity, the detailed mechanisms behind the success of attention distillation remain unexplored, particularly the specific patterns it leverages to benefit training. In this paper, we address this gap by conducting a comprehensive review of attention distillation workflow and identifying key factors influencing the learning quality of retrieval-based language models. We further propose indicators for optimizing models' training methods and avoiding ineffective training.

Paper Type: short

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

0 Replies

Loading