ProtoMix: Learnable Data Augmentation on Few-Shot Features with Vector Quantization in CTR Prediction
Abstract: Click-Through Rate (CTR) prediction is a critical problem in recommendation systems since it involves enormous business interest. Most deep CTR model follows an Embedding & Feature Interaction paradigm. However, the feature interaction module cannot work well without a good embedding representation of features. Due to the long-tail phenomenon in real scenes, few samples are provided in the dataset for a large proportion of features. In this paper, we present ProtoMix, a model-agnostic framework for learnable data augmentation on few-shot features in CTR prediction. ProtoMix automatically extracts information from co-occurred features within the same instance to assign prototype embedding with vector quantization for few-shot features and further synthesize the embedding representation of the augmented virtual instance for training. Original embedding, feature interaction module, and the embedding generator are jointly trained on a well-designed objective in an end-to-end manner in ProtoMix. We experimentally validate the effectiveness and compatibility of ProtoMix by comparing it with baseline and other data augmentation methods on different deep CTR models and multiple real-world CTR benchmark datasets.
Loading