GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chinese-Kazakh cross-lingual information retrieval (CLIR) aims to search relevant content from a collection of Kazakh documents using Chinese query statements. The intrinsic differences in grammar, vocabulary, and semantic expression between languages pose significant challenges for semantic alignment in CLIR. Existing CLIR methods that incorporate multilingual knowledge graph (MLKG) typically use simple vector stacking approaches to integrate entity information, failing to leverage deeper entity relationships and semantic connections. To address these challenges, we propose GIIM, a graph information integration method for Chinese-Kazakh CLIR that leverages the rich multilingual entity information embedded in MLKG as semantic bridges to narrow the linguistic gap during query-document matching process. Unlike previous methods, GIIM unifies query-document pairs and entity information into a graph structure and employs Graph Convolutional Network to aggregate both direct and multi-hop relations among entities, effectively modeling complex semantic paths and hierarchical knowledge propagation. To comprehensively evaluate GIIM, we construct CKIRD, a Chinese-Kazakh information retrieval dataset containing approximately 11,820 annotated query-paragraph pairs, and conduct experiments on both CKIRD and the public CLIRMatrix datasets. Experimental results show that GIIM outperforms existing baseline models across multiple ranking metrics, demonstrating its effectiveness on the Chinese-Kazakh CLIR task.
Supplementary Material: pdf
Submission Number: 102
Loading