Keywords: Large Language Models, Column Type Annotation, Tabular Data
Abstract: This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for labeled tabular data, making it ideal for scenarios where data collection is costly or restricted due to issues such as privacy concerns. However, existing zero-shot models often perform poorly when dealing with a large number of semantic types and show limited understanding of tabular structure. We propose an efficient zero-shot table generation approach that constructs structured pseudo-tables using publicly available data. Fine-tuning an open-source LLM on these synthetic tables enables it to better capture tabular structure and improve column type annotation. Experiments show that our method outperforms state-of-the-art zero-shot and few-shot models by at least 10.4% and 7%, respectively.
Submission Number: 75
Loading