Cross-Modal Information Retrieval - A Case Study on Chinese Wikipedia

Published: 01 Jan 2012, Last Modified: 22 Oct 2024ADMA 2012EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Probability models have been used in cross-modal multimedia information retrieval recently by building conjunctive models bridging the text and image components. Previous studies have shown that cross-modal information retrieval system using the topic correlation model (TCM) outperforms state-of-the-art models in English corpus. In this paper, we will focus on the Chinese language, which is different from western languages composed by alphabets. Words and characters will be chosen as the basic structural units of Chinese, respectively. We also set up a test database, named Ch-Wikipedia, in which documents with paired image and text are extracted from Chinese website of Wikipedia. We investigate the problems of retrieving texts (ranked by semantic closeness) given an image query, and vice versa. The capabilities of the TCM model is verified by experiments across the Ch-Wikipedia dataset.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview