Abstract: Recently, encoder-decoder model using attention has shown meaningful results in the abstractive summarization tasks. In the attention mechanism, the attention distribution is generated based only on the current decoder state. However, since there are patterns in the process of writing summaries, patterns will exist even in the process of paying attention. In this work, we propose the attention history-based attention model that considers such patterns of the attention history. We build an additional recurrent network, the attention reader network to model the attention patterns. Also, we employ an accumulation vector that keeps the total amount of effective attention to each part of the input text, which is guided by an additional network named the accumulation network. Both the attention reader network and the accumulation vector are used as the additional inputs to the attention mechanism. The evaluation results on the CNN/Daily Mail dataset show that our method better captures the attention pattern and achieves higher ROUGE scores than strong baselines.
0 Replies
Loading