Abstract: The Ojibwe language has several dialects
that vary to some degree in both spoken
and written form. We present a method
of using support vector machines to classify
two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of
text. Classification accuracy at the sentence
level is 90% across a five-fold cross validation and 72% when the sentence-trained
model is applied to a data set of individual
words. Our code and the word level data set
are released openly at https://github.com/evanperson/OjibweDialect
0 Replies
Loading