Revisiting Transformer-based Models for Long Document Classification

Anonymous

Revisiting Transformer-based Models for Long Document Classification

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different long document classification approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods.We examine several aspects of sparse attention (e.g., size of attention window, use of global attention) and hierarchical based (e.g., document splitting strategy) transformers on two different datasets, and we derive practical advice of applying Transformer-based models on long document classification tasks.We find that, if applied properly, Transformer-based models can outperform former state-of-the-art CNN based models on MIMIC-III, a challenging dataset from the clinical domain.

0 Replies

Loading