Abstract: Taxonomies are used to organize knowledge in many applications, including recommender systems, content browsing, or web search. With the emergence of new concepts, static taxonomies become obsolete as they fail to capture up-to-date knowledge. Several approaches have been proposed to address the problem of maintaining taxonomies automatically. These approaches typically rely on a limited set of neighbors to represent a given node in the taxonomy. However, considering distant nodes could improve the representation of some portions of the taxonomy, especially for those nodes situated in the periphery or in sparse regions of the taxonomy.In this work, we propose TaxoComplete, a self-supervised taxonomy completion framework that learns the representation of nodes leveraging their position in the taxonomy. TaxoComplete uses a self-supervision generation process that selects some nodes and associates each of them with an anchor set, which is a set composed of nodes in the close and distant neighborhood of the selected node. Using self-supervision data, TaxoComplete learns a position-enhanced node representation using two components: (1) a query-anchor semantic matching mechanism, which encodes pairs of nodes and matches their semantic distance to their graph distance, such that nodes that are close in the taxonomy are placed closely in the shared embedding space while distant nodes are placed further apart; (2) a direction-aware propagation module, which embeds the direction of edges in node representation, such that we discriminate <node, parent> relation from other taxonomic relations. Our approach allows the representation of nodes to encapsulate information from a large neighborhood while being aware of the distance separating pairs of nodes in the taxonomy. Extensive experiments on four real-world and large-scale datasets show that TaxoComplete is substantially more effective than state-of-the-art methods (2x more effective in terms of HIT@k).
Loading