Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers

Published: 01 Jan 2024, Last Modified: 14 May 2025Inf. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The text preprocessing techniques available in the literature are discussed.•The impact of the three most common techniques on SOTA models is evaluated.•Text preprocessing can significantly affect the performance of Transformers.•Traditional classifiers can outperform Transformers, using appropriate preprocessing.•The proper preprocessing should be based on the models and datasets considered.
Loading