ZTab: Domain-based Zero-shot Annotation for Table Columns

Published: 04 May 2026, Last Modified: 23 Apr 2026ICDE 2026EveryoneRevisionsCC BY 4.0
Abstract: This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to issues such as privacy concerns. However, existing zero-shot models suffer from poor performance in the case of a large number of semantic column types or classes, poor understanding of tabular structures, and privacy risks arising from dependency on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework, to address both performance and zero-shot requirements. ZTab considers a domain configuration given by a set of predefined semantic types, plus sample table schemas based on such types, fine-tunes an annotation LLM using pseudo-tables generated for sample table schemas. ZTab is domain-based zero-shot in that it does not depend on user-specific labeled training data; therefore, no retraining is needed for a test table coming from a similar domain. We describe three cases for domain-based zero-shot. The domain configuration of ZTab provides a trade-off between the extent of zero-shot and the annotation performance: for a ``universal domain" that contains all semantic types, domain-based zero-shot will approach ``pure" zero-shot; on the other hand, a ``specialized domain" that contains semantic types for a specific application will enable better zero-shot performance within that domain. The source code and datasets are available at \href{https://github.com/hoseinzadeehsan/ZTab}
Loading