Application of Character-Level Language Models in the Domain of Polish Statutory Law

Aleksander Smywinski-Pohl, Krzysztof Wróbel, Karol Lasocki, Michal Jungiewicz

Published: 2019, Last Modified: 16 May 2025JURIX 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Polish statutory law so far is distributed as PDF, HTML and text files, where the structure of the rules and the references to internal and external regulations is provided only implicitly. As a result, automatic processing of the regulations in legal information systems is complicated since the semi-structured text needs to be converted to a structured form. In this research, we show how character-level language models help in this task.We apply them to the problems of detecting the cross-references to structural units (e.g. articles, points, etc.) and detecting the cross-references to statutory laws (titles of laws and ordinances). We obtain 98.7% macro-average F1 in the first problem and 95.8% F1 in the second problem.