FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

Published: 01 Jan 2022, Last Modified: 30 Sept 2024IEEE Big Data 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning has been widely deployed for online ads systems to predict click-through rate (CTR). Practitioners frequently re-train CTR models to test their new extracted features. As the CTR model training relies on a large number of raw input data logs, the feature extraction step takes a significant portion of the training time. In this paper, we propose FeatureBox, a novel end-to-end training framework that pipelines the feature extraction and the training on GPU servers to save the intermediate I/O of the feature extraction. We rewrite computation-intensive feature extraction operators as GPU operators and leave the memory-intensive operator on CPUs. We introduce a layer-wise operator scheduling algorithm to schedule these heterogeneous operators. We present a light-weight GPU memory management algorithm that supports dynamic GPU memory allocation with minimal overhead. We experimentally evaluate FeatureBox and compare it with the previous in-production feature extraction framework on two ads applications. The results confirm the effectiveness of our proposed method.
Loading