Keywords: Language Model; Tabular Data; Natural Language Processing
Abstract: Transformer-based language models have become the de facto standard in natural language processing. However, they underperform in the tabular data domain compared to traditional tree-based methods. We posit that current models fail to achieve the full potential of language models due to (i) heterogeneity of tabular data; and (2) challenges faced by the model in interpreting numerical values. Based on this hypothesis, we propose a method titled Tabular Domain Transformer (TDTransformer). TDTransformer has distinct embedding processes for different types of columns. The alignment layers for different types of columns transform column embeddings to a common embedding space. Besides, TDTransformer adapts piece-wise linear encoding for numerical values in transformer-based architectures. We examine the proposed method on 76 real-world tabular classification datasets from the standard OpenML benchmark. Extensive experiments indicate that TDTransformer significantly improves the state-of-the-art methods.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8383
Loading