MET : Masked Encoding for Tabular data

Kushal Alpesh Majmundar; Sachin Goyal; Praneeth Netrapalli; Prateek Jain

MET : Masked Encoding for Tabular data

Kushal Alpesh Majmundar, Sachin Goyal, Praneeth Netrapalli, Prateek Jain

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Tabular Data, Self Supervised Learning, Masked Auto-Encoder

TL;DR: Masking based algorithm for SSL on tabular datasets. Key idea: there exists a latent graphical model that captures relations between different coordinates and classification in latent space is easy. Masking based SSL learns this latent structure.

Abstract: We propose $\textit{Masked Encoding for Tabular Data (MET)}$ for learning self-supervised representations from $\textit{tabular data}$. Tabular self-supervised learning (tabular-SSL) -- unlike structured domains like images, audio, text -- is more challenging since each tabular dataset can have a completely different structure among its features (or coordinates), which is hard to identify a priori. $\textit{MET}$ attempts to circumvent this problem by assuming the following hypothesis: the observed tabular data features come from a latent graphical model and the downstream tasks are significantly easier to solve in the latent space. Based on this hypothesis, $\textit{MET}$ uses random masking based encoders to learn a positional embedding for each coordinate, which would in turn capture the latent structure between coordinates. Through experiments on a toy dataset from a linear graphical model, we show that $\textit{MET}$ is indeed able to capture the latent graphical model. Practically, through extensive experiments on multiple benchmarks for tabular data, we demonstrate that $\textit{MET}$ significantly outperforms all the baselines. For example, on Criteo -- a large-scale click prediction dataset -- $\textit{MET}$ achieves as much as $5\%$ improvement over the current state-of-the-art (SOTA) while purely supervised learning based approaches have been able to advance SOTA by at most $2\%$ in the last six years. Furthermore, averaged over $\textit{nine}$ datasets, $\textit{MET}$ is around $3.9\%$ more accurate than the next best method of Gradient-boosted decision trees -- considered as SOTA for the tabular setting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning

Supplementary Material: zip

13 Replies

Loading