Abstract: Community search has aroused widespread interest in the past decades. Among existing solutions, the learning-based models exhibit outstanding performance in terms of accuracy by leveraging labels to 1) train the model for community score learning, and 2)
select the optimal threshold for community identification. However, labeled data are not always available in real-world scenarios. To
address this notable limitation of learning-based models, we propose a pre-trained graph Transformer based community search
framework that uses Zero label (i.e., unsupervised), termed TransZero. TransZero has two key phases, i.e., the offline pre-training
phase and the online search phase. Specifically, in the offline pretraining phase, we design an efficient and effective community
search graph transformer (CSGphormer) to learn node representation. To pre-train CSGphormer without the usage of labels, we
introduce two self-supervised losses, i.e., personalization loss and link loss, motivated by the inherent uniqueness of node and graph
topology, respectively. In the online search phase, with the representation learned by the pre-trained CSGphormer, we compute the
community score without using labels by measuring the similarity of representations between the query nodes and the nodes in
the graph. To free the framework from the usage of a label-based threshold, we define a new function named expected score gain
to guide the community identification process. Furthermore, we propose two efficient and effective algorithms for the community
identification process that run without the usage of labels. Extensive experiments over 10 public datasets illustrate the superior
performance of TransZero regarding both accuracy and efficiency.
Loading