TED$^+$+: Towards Discovering Top-k Edge-Diversified Patterns in a Graph Database

Kai Huang; Yue Cui; Qingqing Ye; Yan Zhao; Xi Zhao; Yao Tian; Kai Zheng; Haibo Hu; Xiaofang Zhou

TED$^+$+: Towards Discovering Top-k Edge-Diversified Patterns in a Graph Database

Kai Huang, Yue Cui, Qingqing Ye, Yan Zhao, Xi Zhao, Yao Tian, Kai Zheng, Haibo Hu, Xiaofang Zhou

Published: 01 Jan 2024, Last Modified: 25 Feb 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar or homogenous topologies, which is undesirable in many graph applications. To address this limitation, we propose the Top-k Edge-Diversified Patterns Discovery problem to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called $\textsc {Ted}^+$ which achieves a guaranteed approximation ratio to the optimal result. Three optimization strategies are further developed to improve the performance, and a lightweight version called TedLite is designed for even larger graph databases. Experimental studies on real-world datasets demonstrate the superiority of $\textsc {Ted}^+$ to traditional techniques.

Loading