Accurate stemming of Dutch for text classification

Tanja Gaustad, Gosse Bouma

08 Jun 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: This paper investigates the use of stemming for classification of Dutch (email) texts. We introduce a stemmer, which combines dictionary lookup (implemented efficiently as a finite state automaton) with a rule-based backup strategy and show that it outperforms the Dutch Porter stemmer in terms of accuracy, while not being substantially slower.

0 Replies