Learning Monotonic Alignments with Source-Aware GMM Attention

Tae Gyoon Kang; Ho-Gyeong Kim; Min-Joong Lee; Jihyun Lee; Seongmin Ok; Hoshik Lee; Young Sang Choi

Learning Monotonic Alignments with Source-Aware GMM Attention

Tae Gyoon Kang, Ho-Gyeong Kim, Min-Joong Lee, Jihyun Lee, Seongmin Ok, Hoshik Lee, Young Sang Choi

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Monotonic alignments, sequence-to-sequence model, aligned attention, streaming speech recognition, long-form speech recognition

Abstract: Transformers with soft attention have been widely adopted in various sequence-to-sequence (Seq2Seq) tasks. Whereas soft attention is effective for learning semantic similarities between queries and keys based on their contents, it does not explicitly model the order of elements in sequences which is crucial for monotonic Seq2Seq tasks. Learning monotonic alignments between input and output sequences may be beneficial for long-form and online inference applications that are still challenging for the conventional soft attention algorithm. Herein, we focus on monotonic Seq2Seq tasks and propose a source-aware Gaussian mixture model attention in which the attention scores are monotonically calculated considering both the content and order of the source sequence. We experimentally demonstrate that the proposed attention mechanism improved the performance on the online and long-form speech recognition problems without performance degradation in offline in-distribution speech recognition.

One-sentence Summary: We focus on monotonic sequence-to-sequence tasks and propose source-aware GMM attention which enables online inference and improves long-form sequence generation performance in speech recognition.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=6uMvA4Tt6_

12 Replies

Loading