LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks

Published: 18 Apr 2025, Last Modified: 18 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: One of the most common approaches for self-supervised representation learning is defining pre-text tasks to learn data representations. Existing works determine pre-text tasks in a "task-agnostic'' way, without considering the forthcoming downstream tasks. This offers an advantage of broad applicability across tasks, but can also lead to a mismatch between task objectives, potentially degrading performance on downstream tasks. In this paper, we introduce TST-LLM, a framework that effectively reduces this mismatch when the natural language-based description of the downstream task is given without any ground-truth labels. TST-LLM instructs the LLM to use the downstream task's description and meta-information of data to discover features relevant to the target task. These discovered features are then treated as ground-truth labels to define "target-specific'' pre-text tasks. TST-LLM consistently outperforms contemporary baselines, such as STUNT and LFR, with win ratios of 95% and 81%, when applied to 22 benchmark tabular datasets, including binary and multi-class classification, and regression tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Video: https://drive.google.com/file/d/1pnBWRbs4lOjT_4nGlo2KD4lU_9XCzWm8/view
Code: https://github.com/Sungwon-Han/TST-LLM
Assigned Action Editor: ~Anthony_L._Caterini1
Submission Number: 4017
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview