Abstract: Automatic summarization of legal documents requires a thorough understanding of their specificities, mainly with respect to the vocabulary used by legal experts. Indeed, the latter rely heavily on their external knowledge when writing summaries, in order to contextualize the main entities of the source document. This leads to reference summaries containing many abstractions, that sota models struggle to generate. In this paper, we propose an entity-driven approach aiming at learning the model to generate factual hallucinations, as close as possible to the abstractions of the reference summaries. We evaluated our approach on two different datasets, with legal documents in English and French. Results show that our approach allows to reduce non-factual hallucinations and maximize both summary coverage and factual hallucinations at entity-level. Moreover, the overall quality of summaries is also improved, showing that guiding summarization with entities is a valuable solution for legal documents summarization.
Loading