SALT : Sharing Attention between Linear layer and Transformer for tabular dataset

Juseong Kim; Jinsun Park; Giltae Song

SALT : Sharing Attention between Linear layer and Transformer for tabular dataset

Juseong Kim, Jinsun Park, Giltae Song

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Tabular data, Attention matrix, Transformer, Deep learning

Abstract: Handling tabular data with deep learning models is a challenging problem despite their remarkable success in vision and language processing applications. Therefore, many practitioners still rely on classical models such as gradient boosting decision trees (GBDTs) rather than deep networks due to their superior performance with tabular data. In this paper, we propose a novel hybrid deep network architecture for tabular data, dubbed SALT (Sharing Attention between Linear layer and Transformer). The proposed SALT consists of two blocks: Transformers and linear layers blocks that take advantage of shared attention matrices. The shared attention matrices enable transformers and linear layers to closely cooperate with each other, and it leads to improved performance and robustness. Our algorithm outperforms tree-based ensemble models and previous deep learning methods in multiple benchmark datasets. We further demonstrate the robustness of the proposed SALT with semi-supervised learning and pre-training with small dataset scenarios.

One-sentence Summary: A novel hybrid deep network architecture for tabular data, dubbed SALT (Sharing Attention between Linear layer and Transformer).

5 Replies

Loading