Abstract: This paper studies social network de-anonymization problem, which aims to identify users of an anonymized network by matching its user set with that of another auxiliary sanitized network. Prior arts primarily assume that both networks share exactly the same set of users, as opposed to many real situations of partially shared users in between. Different from the full matching case that only needs to take care of increasing the number of correctly matched pairs, the case of partial overlapping imposes additional demand on avoiding the wrong matches of those who do not have accounts across networks.To this end, we establish a new cost function, which we call the structural F-score to incorporate both the structural commonness and difference across networks. Intrinsically, the structural F-score computes the ratio of link agreements and disagreements, thus serving as the harmonic mean of precision and recall for any given matching function. Theoretically, we show that for networks parameterized by node overlap t2 and link overlap s2, as long as the mean degree of networks grows as Ω(t-2s-3 log n), maximizing the structural F-score provably ensures the perfect matching, where the nodal precision and recall are both maximized to 1. Algorithmically, for small-scale networks, we propose a two-step heuristic of F-score based de-anonymization, which firstly finds the optimal full matching between networks and then removes those pairs hindering structural F-score maximization. Due to the universal adaptability of the structural F-score, we further extend the algorithm to large-scale networks via a progressive matching process. Empirical results also validate the effectiveness of our methods in terms of improving the nodal F-score.
Loading