Self-supervised Representation Learning Across Sequential and Tabular Features Using TransformersDownload PDF

Published: 21 Oct 2022, Last Modified: 16 May 2023TRL @ NeurIPS 2022 PosterReaders: Everyone
Keywords: self-supervised learning, tabular data, sequential data, robot detection, advertising, transformer
TL;DR: Using Transformer based masked token prediction to learn self-supervised representations across sequential and tabular inputs
Abstract: Machine learning models used for predictive modeling tasks spanning across personalization, recommender systems, ad response prediction, fraud detection etc. typically require a variety of tabular as well as sequential activity features about the user. For tasks like click-through or conversion (purchase) rate prediction where labeled data is available at scale, popular methods use deep sequence models (sometimes pre-trained) to encode sequential inputs, followed by concatenation with tabular features and optimization of a supervised training objective. For tasks like bot and fraud detection, where labeled data is sparse and incomplete, the typical approach is to use self-supervision to learn user embeddings from their historical activity sequence. However, these models are not equipped to handle tabular input features during self-supervised learning. In this paper, we propose a novel Transformer architecture that can jointly learn embeddings on both sequential and tabular input features. Our model learns self-supervised user embeddings using masked token prediction objective on a rich variety of features without relying on any labeled data. We demonstrate that user embeddings generated by the proposed technique are able to successfully encode information from a combination of sequential and tabular features, improving AUC-ROC for linear separability for a downstream task label by $5\%$ over embeddings generated using sequential features only. We also benchmark the efficacy of the embeddings on the bot detection task for a large-scale digital advertising program, where the proposed model improves recall over known bots by $10\%$ over the sequential only baseline at the same False Positive Rate (FPR).
0 Replies