Keywords: Table Representation Learning, Contrastive Learning, Natural Language Processing
Abstract: Effective representation learning for tabular data is critical for downstream tasks such as information retrieval, classification, and missing value imputation. However, existing transformer-based models often fail to generalize across in-domain tables, either preserving schema–value semantics at the cost of robustness or enforcing stability while losing fidelity. We propose NAVI—Entropy-aware Alignment via Header–Value Induction—a framework that unifies both desiderata. NAVI introduces header–value segments as the atomic unit of table representation, serialized in an order-independent manner and anchored by global header embeddings. Structure-aware masked segment modeling enforces schema–value dependencies via balanced masking over headers, values, and tokens, while entropy-driven segment alignment aligns low-entropy (domain-coherent) columns with global headers and high-entropy (entity-discriminative) columns with row-specific values. This joint design yields representations that are both consistent and semantically faithful. Extensive experiments on large-scale benchmarks show that NAVI consistently outperforms baselines in generative and discriminative tasks while mitigating schema-level inconsistencies. The source code of NAVI is available at https://anonymous.4open.science/r/navi.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18302
Loading