Exploring Pooling Strategies and Layer Aggregation in Transformer Encoders for Text Classification

Exploring Pooling Strategies and Layer Aggregation in Transformer Encoders for Text Classification

ACL ARR 2025 May Submission4912 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The choice of pooling strategies and layer selection and aggregation plays a crucial role in the quality of sentence embeddings in Transformer-based models for classification. While the [CLS] token is commonly used for sentence representation, research suggests that alternative pooling methods, such as token averaging, often yield better results. In this work, we systematically study various pooling techniques, including average, sum, and max pooling, as well as novel combinations, alongside layer aggregation strategies for sentence and document embeddings in Transformer encoder-only models. Additionally, we propose to the concatenation of multiple pooling methods to represent a single document. Our experiments, conducted on multiple text classification benchmarks, demonstrate that carefully selecting pooling methods and layer combinations can improve classification accuracy by up to 9% compared to standard approaches. These findings emphasize the importance of exploring diverse strategies for sentence representation and offer valuable insights for optimizing embedding extraction in NLP tasks.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: phrase/sentence embedding, semantic textual similarity

Contribution Types: NLP engineering experiment, Reproduction study

Languages Studied: English

Submission Number: 4912

Loading