KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction

Published: 01 Jan 2023, Last Modified: 18 Oct 2024SIGIR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present part of Huawei's efforts in building a Product Knowledge Graph (PKG). We want to identify which product attributes (i.e. properties) are relevant and important in terms of shopping decisions to product categories (i.e. classes). This is particularly challenging when the attributes and their values are mined from online product catalogues, i.e. HTML pages. These web pages contain semi-structured data, which do not follow a concerted format and use diverse vocabulary to designate the same features. We propose a system for key attribute identification (KATIE) based on fine-tuning pre-trained models (e.g., DistilBERT) to predict the applicability and importance of an attribute to a category. We also propose an attribute synonyms identification module that allows us to discover synonymous attributes by considering not only their labels' similarities but also the similarity of their values sets. We have evaluated our approach to Huawei categories taxonomy and a set of internally mined attributes from web pages. KATIE guarantees promising performance results compared to the most recent baselines.
Loading