Keywords: Multi-Stage System, Data Synthesis
Abstract: As modern information processing systems grow increasingly complex, multi-stage composite frameworks are receiving heightened attention, as models with divergent optimization objectives are able to extract more heterogeneous information. A prominent and widely adopted multi-stage framework in industry is the multi-stage recommendation network, comprising three sequential stages: Recall, Coarse Ranking, and Fine Ranking. Inspired by the data-centric paradigm, we seek to develop a unified data synthesis framework applicable across diverse training objectives. Specifically, we introduce the Unified Data Synthesis system for multi-stage frameworks, which adaptively provides unified structured data of varying quality at different stages to consistently enhance overall recommendation data quality. Initially, UniDS utilizes Real Entropy to evaluate data quality and, through Metric-Oriented Gradient Comparison Theory, demonstrates that different stage objectives exhibit distinct sensitivity for Real Entropy. Subsequently, leveraging this difference, UniDS injects Real Entropy into user sequence segmentation via the Pattern Mining via Conditional Entropy module, aiming to mine interaction patterns among stages. Finally, UniDR establishes a unified, model-agnostic data generation architecture based on the Special Pattern-Token paradigm, which utilizes patterns separated by entropy, thereby simultaneously generating new data and core task representations. This approach ultimately achieves a unified multi-stage data generation paradigm. Extensive experiments on benchmark datasets demonstrate enhanced performance on each model in a multi-stage system, improved flexibility in feature synthesis, and superior stage adaptation. Our anonymized code is available at https://anonymous.4open.science/r/UniDS-9510/
Primary Area: generative models
Submission Number: 4156
Loading