Keywords: Self Supervised Learning, Large Scale Tabular Data, Pre-training, Robot Detection, Advertising, Manifold Mixup, Noise Contrastive Estimation
TL;DR: Pre-training using reconstruction on Large Scale Tabular Data utilizing Manifold Mixup and Noise Contrastive Estimation
Abstract: In this paper, we tackle the problem of self supervised pre-training of deep neural networks for large scale tabular data in online advertising. Self supervised learning has recently been very effective for pre-training representations in domains such as vision, natural language processing, etc. But unlike these, designing self supervised learning tasks for tabular data is inherently challenging. Tabular data can consist of various types of data with high cardinality and range of feature values especially in a large scale real world setting. To that end, we propose a self supervised pre-training strategy that utilizes Manifold Mixup to produce data augmentations for tabular data and perform reconstruction on these augmentations using noise contrastive estimation and mean absolute error losses, both of which are particularly suitable for large scale tabular data. We demonstrate its efficacy by evaluating on the problem of click fraud detection on ads to obtain a 9\% relative improvement on robot detection metrics over a supervised learning baseline and 4\% over a contrastive learning experiment.