Specializing Large Models for Oracle Bone Script Interpretation via Agent-Driven Multimodal Knowledge Augmentation
Keywords: Oracle bone script;Graph RAG;AI-Assisted Archaeological Research;Multimodal Large Models
TL;DR: We design an agent-driven multimodal pipeline for Oracle Bone Script interpretation, support it with a novel expert-annotated component dataset, and enable large models to generate accurate interpretations.
Abstract: Deciphering oracle bone script, which originated over 3,000 years ago and represents the earliest known mature writing system in China, is fascinating and highly challenging. Vision language models (VLMs) offer strong capabilities in perception, understanding, and reasoning, presenting opportunities for cross-disciplinary research. However, their lack of domain-specific knowledge often results in suboptimal performance. Existing approaches largely frame decipherment as an image recognition task, overlooking the hieroglyphic nature of oracle bone script and the structural and semantic information embedded in its component-based design.
To address these challenges, we propose an agent-driven multimodal retrieval-augmented generation (RAG) framework that enables large models to act as domain experts for oracle bone research. We also introduce OB-Radix, a component-level oracle bone script dataset annotated by domain experts, which provides essential structural and semantic information absent from prior datasets. Furthermore, guided by expert knowledge, we design three benchmark tasks to systematically evaluate the ability of VLMs in oracle bone decipherment. Experimental results demonstrate that our framework produces more detailed and accurate interpretations than baseline methods.
Beyond oracle bone script, our framework establishes a methodological foundation for applying large models to the decipherment of other logographic writing systems.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2334
Loading