Keywords: graph indexing, graph retrieval, graph representation learning
TL;DR: We design inverted indices for graph-structured objects.
Abstract: Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively.
We introduce CoRGII (COntextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indexes, while supporting soft (vector) set containment scores. Improving on this paradigm further, we replace the classical impact score of a `word' on a graph (such as defined by TFIDF or BM25) with a data-driven, trainable impact score.
Crucially, CoRGII is trained end-to-end using only binary relevance labels, without fine-grained supervision of query-to-document set alignments. Extensive experiments show that CoRGII provides better trade-offs between efficiency and accuracy, compared to several baselines.
Primary Area: Infrastructure (e.g., libraries, improved implementation and scalability, distributed solutions)
Submission Number: 27519
Loading