Keywords: Japanese contract, Legal provision annotation, Corpus construction
TL;DR: JGovCCC-PDL is a Japanese legal corpus annotated for clause-type classification and compiled from 212 public procurement contract documents, such as standard contracts and terms and conditions, issued by 32 Japanese public institutions.
Abstract: We present JGovCCC-PDL (Japanese Government Contract Clause Corpus based on Public Data License), a Japanese corpus annotated for clause-type classification in public procurement contracts. We collected 212 publicly available contract documents (e.g., standard contracts and terms and conditions) from 32 public institutions, including the Japanese national government and local municipalities. All documents were released under the Japanese Public Data License (Version 1.0) or comparable terms that permit reproduction, public transmission, and adaptation with appropriate attribution. We segmented the documents into 9,281 clauses and annotated each clause with two label levels: a fine-grained clause-type label (50 classes) and a coarse-grained category label (9 classes). We further define a leakage-resistant evaluation protocol tailored to duplicate-heavy procurement contracts. Because clause-type identification is a prerequisite for downstream contract review, clause retrieval, issue extraction, and later contract reasoning, we position JGovCCC-PDL as a foundational resource for Japanese contract-oriented Legal NLP.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 11
Loading