Bronze Inscription Restoration

ACL ARR 2025 May Submission7804 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Bronze inscriptions from early China are often fragmentary, with missing or undeciphered characters limiting linguistic and historical analysis. Addressing this challenge requires models that can generalize across orthographic variation and diachronic script change. This paper introduces three contributions to support computational processing of bronze inscriptions: (i) a fully digitized and Unicode-encoded corpus of over 40,000 inscriptional characters; (ii) a glyph network linking diachronic variants to shared semantic anchors; and (iii) a masked language modeling (MLM) framework with variant-aware augmentation, alongside a periodization classification task. Experiments show that domain-adaptive pretraining and glyph-aware modeling substantially improve restoration accuracy.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Low-resource methods, Epigraphic Chinese, Character restoration, Masked language modeling, Ancient language processing
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Ancient Chinese, Bronze Inscriptions
Submission Number: 7804
Loading