Attentive MLP for Non-Autoregressive Generation

Shuyang Jiang; Jun Zhang; Jiangtao Feng; Lin Zheng; Lingpeng Kong

Attentive MLP for Non-Autoregressive Generation

Shuyang Jiang, Jun Zhang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Non-autoregressive, AMLP, linear complexity

TL;DR: We propose Attentive Multi-Layer Perceptron (AMLP) to integrate the attention mechanism with the multi-layer perceptron (MLP) in non-autoregressive architecture.

Abstract: Autoregressive~(AR) generation almost dominates sequence generation for its efficacy. Recently, non-autoregressive~(NAR) generation gains increasing popularity for its efficiency and growing efficacy. However, its efficiency is still bottlenecked by softmax attention of quadratic complexity on computational time and memory cost. Such bottleneck prevents non-autoregressive models from scaling to long sequence generation and few works have been done to mitigate this problem. In this paper, we propose a novel MLP variant, \textbf{A}ttentive \textbf{M}ulti-\textbf{L}ayer \textbf{P}erceptron~(AMLP), to produce a generation model with linear time and space complexity. Different from classic MLP with static and learnable projection matrices, AMLP leverages adaptive projections computed from inputs in an attentive mode. And different from softmax attention, AMLP uses sample-aware adaptive projections to enable communications among tokens in a sequence, and models the measurement between the query and key space. Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity. The empirical results show that such marriage architecture NAR-AMLP surpasses competitive NAR efficient models, by a significant margin on text-to-speech synthesis and machine translation. We also test AMLP's self- and cross-attention ability separately with extensive ablation experiments, and find them comparable or even superior to the other efficient models. The efficiency analysis further shows that AMLP speeds up the inference and extremely reduces the memory cost against vanilla non-autoregressive models. All the experiments reveal that NAR-AMLP is a promising architecture in both of efficiency and efficacy.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

13 Replies

Loading