Short Texts Representations for Legal Domain Classification

Tomasz Zymkowski, Julian Szymanski, Andrzej Sobecki, Pawel Drozda, Konrad Szalapak, Kajetan Komar-Komarowski, Rafal Scherer

Published: 2022, Last Modified: 16 Oct 2023ICAISC (1) 2022Readers: Everyone

Abstract: This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.

0 Replies