Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval

Savvas A. Chatzichristofis, Konstantinos Zagoris, Avi Arampatzis

2011 (modified: 12 Nov 2022)SIGIR 2011Readers: Everyone

Abstract: The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being near-duplicate to queries. In this experimental study we demonstrate that this advantage of BOVW is diminished when visual diversity is enhanced by using a secondary modality, such as text, to pre-filter images. The TOP-SURF descriptor is evaluated against Compact Composite Descriptors on a two-stage image retrieval setup, which first uses a text modality to rank the collection and then perform CBIR only on the top-K items.

0 Replies