Abstract: The importance of modeling contextual information within a search session has been widely acknowledged. However, learning representations of multi-query multi-modal (MM) search, in which Mobile Taobao users repeatedly submit textual and visual queries, remains unexplored in literature. Previous work which learns task-specific representations of textual query sessions fails to capture diverse query types and correlations in MM search sessions. This paper presents to represent MM search sessions by heterogeneous graph neural network (HGN). A multi-view contrastive learning framework is proposed to pretrain the HGN, with two views to model different intra-query, inter-query, and inter-modality information diffusion in MM search. Extensive experiments demonstrate that, the pretrained session representation can benefit state-of-the-art baselines on various downstream tasks, such as personalized click prediction, query suggestion, and intent classification.
0 Replies
Loading