Compositional Data Augmentation for Abstractive Conversation SummarizationDownload PDF


16 Nov 2021, 18:24 (edited 14 Jan 2022)ACL ARR 2021 November Blind SubmissionReaders: Everyone
  • Abstract: Recent abstractive conversation summarization systems generally rely on large-scale annotated summaries. However, collecting conversations and annotating their corresponding summaries can be time-consuming and labor-intensive. To alleviate the data scarcity issue, in this work, we present a simple yet effective compositional data augmentation method, Compo, for generating diverse and high-quality pairs of conversations and summaries. Specifically, we generate novel conversation and summary pairs through first extracting conversation snippets and summary sentences based on conversation stages and then randomly composing them constrained by the temporal relation and semantic similarities. To deal with the noises in the augmented data, we further utilize knowledge distillation to learn concise representation from a teacher model trained on high-quality data. Extensive experiments on benchmark datasets demonstrate that Compo significantly outperforms prior state-of-the-art baselines in terms of both quantitative and qualitative evaluation, and exhibits a reasonable level of interpretability.
0 Replies