Knowledge-Rich Embeddings for Tabular Learning

Félix Lefebvre; Myung Jun Kim; Gaël Varoquaux

Knowledge-Rich Embeddings for Tabular Learning

Félix Lefebvre, Myung Jun Kim, Gaël Varoquaux

Published: 18 Nov 2025, Last Modified: 18 Nov 2025AITD@EurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short paper (4 pages)

Keywords: Tabular Learning, Large Language Models, Knowledge Graphs

TL;DR: Extensive experiments on real-life datasets show that knowledge-rich representations boost tabular learning

Abstract: Tables have their own structure, calling for dedicated tabular learning methods with the right inductive bias. These methods outperform direct applications of language models, which struggle with the heterogeneous features typical in tables, such as numerical data or dates. Yet, many tables contain text that refers to real-world entities, and most tabular learning methods ignore the external knowledge that such strings could unlock. Which knowledge-rich representations should tabular learning leverage? While large language models (LLMs) encode implicit factual knowledge, knowledge graphs (KGs) share the relational structure of tables and come with the promise of better-controlled knowledge. Studying tables in the wild, we assemble 105 tabular learning datasets comprising text. We find that knowledge-rich representations from LLMs or KGs boost prediction and, combined with simple linear models, markedly outperform strong tabular baselines. Larger LLMs and larger KGs both provide greater gains. On datasets where all entities are linked to a KG, LLMs and KG models of similar size perform similarly, suggesting that the benefit of LLMs over KGs is to solve the entity linking problem. Our results highlight that external knowledge is a powerful but underused ingredient for advancing tabular learning.

Submission Number: 15

Loading