Assisting Drafting of Chinese Legal Documents Using Fine-Tuned Pre-trained Large Language Models

Chun-Hsien Lin, Pu-Jen Cheng

Published: 01 Jan 2025, Last Modified: 13 May 2025Rev. Socionetwork Strateg. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Fine-tuning pretrained large language models (LLMs) has become a mainstream paradigm for solving downstream natural language processing tasks. However, training a language model for legal applications requires a large corpus of legal documents to enable the language model to learn legal terminology and the particularity of legal formatting. Typical NLP approaches usually rely on manually annotated datasets for training; however, such legal field datasets are difficult to obtain. In this study, a large corpus of public, annotation-free legal documents in Chinese but without word segmentation were used to fine-tune a pretrained LLM to generate content for legal document drafts. Moreover, this was performed locally, ensuring information privacy and improving security. Finally, an evaluation method for the generated documents was developed to enable objectively assessing the quality of the drafts.