Abstract: In this paper, we introduce a new French Data-to-Text (D2T) dataset in the legal domain: Plum2Text1. It is made out of plumitifs (docket files) - descriptions pairs that are derived from publicly available documents issued by Canadian criminal courts. The development of Plum2Text is primarily intended to train statistical natural language generation algorithms, in order to make the plumitifs more easily understandable for Canadian citizens. The inputs and outputs of the dataset are unique: on the data side, the values of the table contain long pieces of textual utterance, and on the text side (or reference), it most often consists of a paraphrase of the table values. We describe how we curated the plumitif-description associations by introducing an annotation tool and a methodology specific to the D2T natural language generation task. We do so by using simple yet efficient text classifiers to help the annotator leverage annotated examples during the annotation process. As a matter of privacy, we also illustrate how we are decontextualizing the descriptions.
0 Replies
Loading