Clause-Level Similarity and Network Analysis of Standard Service Contract Documents Used in Public Procurement across Japanese National Universities

Published: 22 May 2026, Last Modified: 22 May 2026ICAIL 2026 Workshop on Artificial Intelligence and Open GovernmentEveryoneRevisionsCC BY 4.0
Keywords: national university, contract, public procurement, text analysis, natural language processing, document similarity
TL;DR: This paper examines publicly available standard contract documents for service procurement used by 41 Japanese national universities and identifies both shared standard provisions and university-specific variations through clause-level comparison.
Abstract: This study examines publicly available standard contract documents for service procurement used by Japanese national universities and identifies both shared standard provisions and university-specific variations through clause-level comparison. We collected documents from 41 national universities, segmented them into 1,432 clauses, aligned comparable clauses using 55 manually assigned and adjudicated label types, and computed inter-clause similarity using character 3-grams and BM25. We further analyzed both verbatim reuse and minor wording adjustments by constructing label-specific clause-similarity networks and an aggregate inter-university similarity measure. The results show that the documents do not converge on a single template; rather, they share a common structural core overlaid with university-specific wording adjustments. Verbatim reuse was observed across clauses in multiple categories, whereas termination clauses shared similar expressions while diverging into several wording clusters. A supplementary geographic analysis found a significant negative association between textual similarity and geographic distance among universities, suggesting that these contract documents are shaped not only by nationwide institutional commonalities but also by regionally shared templates and practical conventions. This study organizes similar contracts issued by different institutions into comparable clause-level data and presents a method for capturing standard provisions and local variations within a unified framework.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 12
Loading