Specializing Large Models for Oracle Bone Script Interpretation via Agent-Driven Multimodal Knowledge Augmentation

Jianing Zhang; Runan Li; Honglin Pang; Ding Xia; Zhou Zhu; Qian Zhang; Chuntao Li; Xi Yang

Specializing Large Models for Oracle Bone Script Interpretation via Agent-Driven Multimodal Knowledge Augmentation

Jianing Zhang, Runan Li, Honglin Pang, Ding Xia, Zhou Zhu, Qian Zhang, Chuntao Li, Xi Yang

05 Sept 2025 (modified: 29 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Oracle bone script;Graph RAG;AI-Assisted Archaeological Research;Multimodal Large Models

TL;DR: We design an agent-driven multimodal pipeline for Oracle Bone Script interpretation, support it with a novel expert-annotated component dataset, and enable large models to generate accurate interpretations.

Abstract: Deciphering oracle bone script, which originated over 3,000 years ago and represents the earliest known mature writing system in China, is fascinating and highly challenging. Vision language models (VLMs) offer strong capabilities in perception, understanding, and reasoning, presenting opportunities for cross-disciplinary research. However, their lack of domain-specific knowledge often results in suboptimal performance. Existing approaches largely frame decipherment as an image recognition task, overlooking the hieroglyphic nature of oracle bone script and the structural and semantic information embedded in its component-based design. To address these challenges, we propose an agent-driven multimodal retrieval-augmented generation (RAG) framework that enables large models to act as domain experts for oracle bone research. We also introduce OB-Radix, a component-level oracle bone script dataset annotated by domain experts, which provides essential structural and semantic information absent from prior datasets. Furthermore, guided by expert knowledge, we design three benchmark tasks to systematically evaluate the ability of VLMs in oracle bone decipherment. Experimental results demonstrate that our framework produces more detailed and accurate interpretations than baseline methods. Beyond oracle bone script, our framework establishes a methodological foundation for applying large models to the decipherment of other logographic writing systems.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 2334

Loading