Contextual Tokenization for Graph Inverted Indices

Pritish Chakraborty; Indradyumna Roy; Soumen Chakrabarti; Abir De

Contextual Tokenization for Graph Inverted Indices

Pritish Chakraborty, Indradyumna Roy, Soumen Chakrabarti, Abir De

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: graph indexing, graph retrieval, graph representation learning

TL;DR: We design inverted indices for graph-structured objects.

Abstract: Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CoRGII (COntextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indexes, while supporting soft (vector) set containment scores. Improving on this paradigm further, we replace the classical impact score of a `word' on a graph (such as defined by TFIDF or BM25) with a data-driven, trainable impact score. Crucially, CoRGII is trained end-to-end using only binary relevance labels, without fine-grained supervision of query-to-document set alignments. Extensive experiments show that CoRGII provides better trade-offs between efficiency and accuracy, compared to several baselines.

Primary Area: Infrastructure (e.g., libraries, improved implementation and scalability, distributed solutions)

Submission Number: 27519

Loading