LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Abstract: Position-based re-weighting functions have been proposed recently as a promising approach to recover degraded model performance from conventional linearized transformers. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply to autoregressive and simultaneous tasks where the target and sometimes even the input sequence length are unknown beforehand. To resolve this issue and enable these re-weighting functions for a wider range of tasks, we propose Learned Proportions (LeaP) and LeaPformers. Our contribution is built on two major components. First, we generalize the dependence on explicit positional representations and sequence lengths into a dependence on sequence proportions for re-weighting, removing theoretical dependence on sequence lengths. Second, we replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns. We validate the potential of LeaPformer against eight representative efficient transformers on the competitive Long-Range Arena benchmark, where LeaPformer achieves the best quality-throughput trade-off. We also demonstrate, for the first time, that position-based re-weighting functions can be applied to simultaneous tasks, achieving competitive results on speech-to-text translation for two language pairs.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, German, French
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading