Abstract: Given a large graph $G$, a subgraph query $Q$ finds the set of all subgraphs of $G$ that satisfy certain conditions specified by $Q$. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.
External IDs:doi:10.1109/tkde.2025.3537964
Loading