Abstract: This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.
0 Replies
Loading