Language Models Are Good Tabular Learners

Zhenhan Huang; Kavitha Srinivas; Horst Samulowitz; Niharika S. D'Souza; Charu C. Aggarwal; Pin-Yu Chen; Jianxi Gao

Language Models Are Good Tabular Learners

Zhenhan Huang, Kavitha Srinivas, Horst Samulowitz, Niharika S. D'Souza, Charu C. Aggarwal, Pin-Yu Chen, Jianxi Gao

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language Model; Tabular Data; Natural Language Processing

Abstract: Transformer-based language models have become the de facto standard in natural language processing. However, they underperform in the tabular data domain compared to traditional tree-based methods. We posit that current models fail to achieve the full potential of language models due to (i) heterogeneity of tabular data; and (2) challenges faced by the model in interpreting numerical values. Based on this hypothesis, we propose a method titled Tabular Domain Transformer (TDTransformer). TDTransformer has distinct embedding processes for different types of columns. The alignment layers for different types of columns transform column embeddings to a common embedding space. Besides, TDTransformer adapts piece-wise linear encoding for numerical values in transformer-based architectures. We examine the proposed method on 76 real-world tabular classification datasets from the standard OpenML benchmark. Extensive experiments indicate that TDTransformer significantly improves the state-of-the-art methods.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8383

Loading