Improving Taxonomy-based Categorization with Categorical Graph Neural Networks

Tianchuan Du, Keng-hao Chang, Paul Liu, Ruofei Zhang

Published: 01 Jan 2021, Last Modified: 19 Feb 2025IEEE BigData 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In search and retrieval, a critical subtask is the classification of user search queries into predefined categories. Traditional supervised multi-class classification algorithms usually treat each category independently. In practical applications however, the categories have implicit relationships. Categories are organized as a tree-based taxonomy, which can be viewed as a graph. In this work, we explore a novel and systematic way of leveraging semantic information for improving taxonomy-based categorization. We propose a class of graph-based network structures, which we call Categorical Graph Neural Networks (CaGNN). CaGNNs leverage relationship information between neighbor categories and overlay the semantic information for each category, thus improving the performance of query categorization. The CaGNN framework can integrate a baseline categorizer with any Graph Neural Network, such as the commonly used Graph Attention Network and Graph Convolutional Network. Over a query categorization dataset of 2k categories and another ad title categorization dataset of 5k categories, CaGNN improves categorizers’ performance significantly compared to a baseline Deep Neural Network model without the CaGNN structure. Notably top 3 prediction recall increases from 90.15% to 91.40% for the ad title categorization task, for which is quite significant at over 90% level for more than 5k categories. By inspecting the learned category embeddings and the flow of message passing, we show that CaGNN effectively encapsulates useful graph structural information. Online A/B testing result shows that an ad ranking model with CaGNN-based features has increased ad click-through rate by 1.81% and reduced defect rate by 2.64%. The model has been deployed to production.