A lemma-based approach to a maximum entropy word sense disambiguation system for DutchDownload PDF

08 Jun 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: In this paper, we present a corpus-based supervised word sense disambiguation (WSD) system for Dutch which combines statistical classification (maximum entropy) with linguistic information. Instead of building individual classifiers per ambiguous wordform, we introduce a lemma-based approach. The advantage of this novel method is that it clusters all inflected forms of an ambiguous word in one classifier, therefore augmenting the training material available to the algorithm. Testing the lemmabased model on the Dutch SENSEVAL-2 test data, we achieve a significant increase in accuracy over the wordform model. Also, the WSD system based on lemmas is smaller and more robust.
0 Replies

Loading