LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks

Sungwon Han; Seungeon Lee; Meeyoung Cha; Sercan O Arik; Jinsung Yoon

LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks

Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, Jinsung Yoon

Published: 18 Apr 2025, Last Modified: 18 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: One of the most common approaches for self-supervised representation learning is defining pre-text tasks to learn data representations. Existing works determine pre-text tasks in a "task-agnostic'' way, without considering the forthcoming downstream tasks. This offers an advantage of broad applicability across tasks, but can also lead to a mismatch between task objectives, potentially degrading performance on downstream tasks. In this paper, we introduce TST-LLM, a framework that effectively reduces this mismatch when the natural language-based description of the downstream task is given without any ground-truth labels. TST-LLM instructs the LLM to use the downstream task's description and meta-information of data to discover features relevant to the target task. These discovered features are then treated as ground-truth labels to define "target-specific'' pre-text tasks. TST-LLM consistently outperforms contemporary baselines, such as STUNT and LFR, with win ratios of 95% and 81%, when applied to 22 benchmark tabular datasets, including binary and multi-class classification, and regression tasks.

Submission Length: Regular submission (no more than 12 pages of main content)

Video: https://drive.google.com/file/d/1pnBWRbs4lOjT_4nGlo2KD4lU_9XCzWm8/view

Code: https://github.com/Sungwon-Han/TST-LLM

Assigned Action Editor: ~Anthony_L._Caterini1

Submission Number: 4017

Loading