N-gram and Neural Language Models for Discriminating Similar Languages

Andre Cianflone, Leila Kosseim

2016 (modified: 14 Jun 2024)VarDial@COLING 2016Readers: Everyone

Abstract: This paper describes our submission to the 2016 Discriminating Similar Languages (DSL) Shared Task. We participated in the closed Sub-task 1 with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with an LSTM layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning. The second approach is a character-based n-gram model of size 7. It achieved an accuracy of 88.45% which is close to the accuracy of 89.38% achieved by the best submission.

0 Replies