Efficient Maximal Biclique Enumeration on Large Uncertain Bipartite Graphs

Published: 01 Jan 2023, Last Modified: 13 May 2025IEEE Trans. Knowl. Data Eng. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this article, we study the problem of maximal biclique enumeration on large uncertain bipartite graphs. Given an uncertain bipartite graph $\mathcal {G}=(U,V,E,p)$ , a probability threshold $\tau$ , and two size constraints $\alpha$ and $\beta$ , we aim to efficiently enumerate all maximal $\tau$ -bicliques in $\mathcal {G}$ , where a maximal $\tau$ -biclique $B(L,R)$ is a complete subgraph of $\mathcal {G}$ with (1) the probability of $B$ is no less than $\tau$ , (2) $|L| \geq \alpha$ and $|R| \geq \beta$ , and (3) $B$ is a maximal complete subgraph satisfying (1) and (2). This problem has many applications, such as biclustering of gene expression data, fraud detection, similar group identification, etc. Despite the wide range of applications, to the best of our knowledge, we note that there are no efficient and scalable solutions to this problem in the literature. This problem is computationally challenging due to its #P-completeness. In this article, we propose a competitive branch-and-bound method, namely ${\sf MBEN}$ , which explores the search space in a depth-first manner with a variety of pruning techniques. To improve the performance of ${\sf MBEN}$ , we propose several novel and efficient search processing optimizations. First, we always select the side with fewer candidates to expand the search space. With this search strategy, we have a chance to prune the fruitless branches early. Second, we devise an advanced pruning technique by considering the size pruning and probability pruning at the same time to boost the pruning capacity. Last, we implement ${\sf MBEN}$ with pre-allocated arrays and pointer maintaining techniques such that the frequent work sets creating operations can be substituted by array element switching operations. In addition, we introduce useful graph reduction techniques to further accelerate the computation. Comprehensive performance studies on 10 real datasets demonstrate that our proposals can significantly outperform the baseline methods by more than two orders of magnitude.
Loading