Abstract: The choice of pooling strategies and layer selection and aggregation plays a crucial role in the quality of sentence embeddings in Transformer-based models for classification. While the [CLS] token is commonly used for sentence representation, research suggests that alternative pooling methods, such as token averaging, often yield better results. In this work, we systematically study various pooling techniques, including average, sum, and max pooling, as well as novel combinations, alongside layer aggregation strategies for sentence and document embeddings in Transformer encoder-only models. Additionally, we propose to the concatenation of multiple pooling methods to represent a single document. Our experiments, conducted on multiple text classification benchmarks, demonstrate that carefully selecting pooling methods and layer combinations can improve classification accuracy by up to 9% compared to standard approaches. These findings emphasize the importance of exploring diverse strategies for sentence representation and offer valuable insights for optimizing embedding extraction in NLP tasks.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: phrase/sentence embedding, semantic textual similarity
Contribution Types: NLP engineering experiment, Reproduction study
Languages Studied: English
Submission Number: 4912
Loading