Abstract: In many applications, data objects can be represented as sets. For example, in video on-demand and social network services, the user data consists of a set of movies that have been watched and a set of users (friends), respectively, and they can be used for recommendation and information extraction. The problem of set similarity self-join hence has been studied extensively. Existing studies assume that sets are static, but in the above applications, sets are dynamically updated, and this requires continuous updating the join result. In this paper, we study a novel problem, dynamic set kNN self-join, i.e., for each set, we continuously compute its k nearest neighbor sets. Our problem poses a challenge for the efficiency of computation, because just an element insertion (deletion) into (from) a set may affect the kNN results of many sets. To address this challenge, we first investigate the property of the dynamic set kNN self-join problem to observe the search space derived from a set update. Then, based on this observation, we propose an efficient algorithm. This algorithm employs an indexing technique that enables incremental similarity computation and prunes unnecessary similarity computation. Our empirical studies using real datasets show the efficiency and scalability of our algorithm.
0 Replies
Loading