Abstract: For existing models of abstractive summarization, the paradigm of autoregressive decoder inherently prefers relying on former tokens and the prediction error will propagate subsequently. To effectively eliminate the errors, we need a way to remodeling dependency during text generation. In this paper, we introduce MDSumma (as shorthand for Masked Decoder for Summarization), which masks partial tokens in decoder, aiming to alleviate the over-reliance on the antecedent. Moreover, with further facilitating the flexibility and diversity of textual representation, we employ a variational autoencoder model, sampling continuous latent variables from the probability distribution to explicitly model underlying semantics of the target summaries. Our architecture gives good balance between encoder contextual representation and decoder prediction, sidestepping the gap between training and inference. Experimental results on three benchmark datasets validate the effectiveness that our proposed method significantly outperforms the existing state-of-the-art approaches both on ROUGE and diversity scores.
0 Replies
Loading