Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Evan Lucas

20 Feb 2024OpenReview Archive Direct UploadReaders: Everyone

Abstract: The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly at https://github.com/evanperson/OjibweDialect

0 Replies