MNCM: Multi-level Network Cascades Model for Multi-Task Learning

Haotian Wu

MNCM: Multi-level Network Cascades Model for Multi-Task Learning

Haotian Wu

Published: 01 Jan 2022, Last Modified: 25 Dec 2024CIKM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, multi-task learning based on the deep neural network has been successfully applied in many recommender system scenarios. The prediction quality of current mainstream multi-task models often relies on the extent to which the relationships among tasks are extracted. Much of the prior research work has focused on two important tasks in recommender systems: predicting click-through rate (CTR) and post-click conversion rate (CVR), which rely on sequential user action pattern of impression → click → conversion. Therefore, there exists sequential dependence between CTR and CVR tasks. However, there is no satisfactory solution to explicitly model the sequential dependence among tasks without sacrificing the first task in terms of the design of the model network structure. In this paper, inspired by the Multi-task Network Cascades (MNC) and Adaptive Information Transfer Multi-task (AITM) frameworks, we propose a Multi-level Network Cascades Model (MNCM) based on the pattern of specific and shared experts separation. In MNCM, we introduce two types of information transfer modules: Task-Level Information Transfer Module (TITM) and Expert-Level Information Transfer Module (EITM), which can learn transferred information adaptively from task level and task-specific experts level, respectively, thereby fully capture sequential dependence among tasks. Compared with AITM, MNCM effectively avoids the problem of the first task in a task sequence becoming the sacrificial side of the seesaw phenomenon and contributes to mitigating potential conflicts among tasks. We conduct considerable experiments based on open-source large-scale recommendation datasets. The experimental results demonstrate that MNCM outperforms AITM and the mainstream baseline models in the mixture-experts-bottom pattern and probability-transfer pattern. In addition, we conduct an ablation study on the necessity of introducing two kinds of information transfer modules and verify the effectiveness of this pattern.

Loading