CriminelBART: a French Canadian legal language model specialized in criminal lawOpen Website

2021 (modified: 14 Mar 2022)ICAIL 2021Readers: Everyone
Abstract: Learning language representations is a key component in many natural language processing tasks, and their usefulness is most often challenged by specialized target domains and vocabulary. We have witnessed several neural causal language models (CLM) that learn contextual representations such as ELMo [8]. More recently, the Transformer architecture [10] has tremendously improved language representation learning, giving birth to new architectures such as BERT [4], a masked language model, pushing the state-of-the-art of natural language understanding to an unprecedented level of performance on standard benchmarks. Moreover, it has been found that Transformer-based CLM, such as GPT [9], are excellent feature extractors as well as being impressive text generators. BART [7], an architecture combining the backbone of both BERT and GTP proved to be particularly effective at generating text while being competitive in comprehension tasks. BARThez, the French version of BART, was recently introduced as a pre-trained model on a very large monolingual French corpus [6]. In this paper, we introduce CriminelBART, a fine-tuned version of BARThez specialized for criminal law using a French Canadian corpus of legal judgments, and we evaluate its performance on different tasks.
0 Replies

Loading