Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Jiawen Xie; Pengyu Cheng; Xiao Liang; Yong Dai; nan du

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, nan du

22 Sept 2023 (modified: 17 Aug 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: transformer, long-sequence processing, reinforcement learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A simple framework for long-sequence processing

Abstract: Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the off-the-shelf pre-trained transformers to process much longer sequences, while the computation and memory costs remain growing linearly with the input sequence lengths. More specifically, our method divides each long-sequence input into a batch of chunks, then aligns the inter-chunk information during the encoding steps, and finally selects the most representative hidden states from the encoder for the decoding process. To extract inter-chunk semantic information, we align the start and end token embeddings among chunks in each encoding transformer block. To learn an effective hidden selection policy, we design a dual updating scheme inspired by reinforcement learning, which regards the transformers as environments, and leverages the attention scores and the downstream performance feedback as the rewards to optimize the hidden selection policy. Our empirical results on real-world long-text abstractive summarization and reading comprehension tasks demonstrate effective improvements compared to prior long-sequence processing baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6271

Loading