Interpretable Oracle Bone Script Decipherment through Radical and Pictographic Analysis with LVLMs

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Oracle Bone Script, Large Vision-Language Models
Abstract: As the oldest mature writing system, Oracle Bone Script (OBS) has long posed significant challenges for archaeological decipherment due to its rarity, abstractness, and pictographic diversity. Recently, deep learning-based methods have made exciting progress on the OBS decipherment task. However, they often ignore the intricate connections between the glyphs and meanings of OBS, resulting in limited generalization and interpretability. To this end, we propose an OBS decipherment method based on Large Vision-Language Models, which attempts to bridge the gap between glyphs and meanings and to interpret the deciphering process. Specifically, we propose a progressive training strategy that guides the model from radical analysis to pictographic analysis and then to mutual analysis, enabling it to comprehend the rich semantic information embedded within OBS glyphs. These analysis contents are used to obtain decipherment results (i.e., the corresponding modern Chinese characters), retrieved from a dictionary via our proposed Radical-Pictographic Dual Matching mechanism, thereby allowing the decipherment process to be interpretable. To facilitate model training, we also propose a Pictographic Decipherment OBS Dataset, which comprises 3,173 OBS classes and 47,157 Chinese characters from different dynasties, which is a well-organized dataset containing detailed glyph analysis. Experiments on public benchmarks demonstrate that our method achieves competitive OBS decipherment capabilities and interpretability. Additionally, the interpretability enables our method to provide possible applicable reference content for undeciphered OBS, and thus has potential applications in historical research. The dataset and code repository will be released in camera-ready.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7143
Loading