Efficiently encoding term co-occurrences in inverted indexesOpen Website

2011 (modified: 11 Jan 2023)CIKM 2011Readers: Everyone
Abstract: Precomputation of common term co-occurrences has been successfully applied to improve query performance in large scale search engines based on inverted indexes. The results of such precomputations are traditionally stored as additional posting lists in the index. During query evaluation, these precomputed lists are used to reduce the number of query terms, as the results for multiple terms can be accessed through a single precomputed list. In this paper, we expand this paradigm by considering an alternative method for storing term co-occurrences in inverted indexes. For a selected set of terms in the index, we store bitmaps that encode term co-occurrences. A bitmap of size k for term t augments each posting to store the co-occurrences of t with k other terms, across every document in the index. At query evaluation, size k bitmaps can be used to answer queries that involve any of the 2^k combinations of the additional terms. In contrast, a precomputed list, although typically shorter, can only be used to evaluate queries containing all of its terms. We evaluate the bitmaps technique we propose, and the baseline of adding precomputed posting lists and show that they are complementary, as they capture different aspects of the query evaluation cost. We perform an experimental evaluation on the TREC WT10g corpus and show that a hybrid strategy combining both methods significantly lowers the cost of query evaluation compared to each method separately.
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview