Introducing the Observatory Library for End-to-End Table Embedding Inference
Keywords: Table Representation Learning, Tabular Language Models, End-to-End Table Embedding Inference
TL;DR: To the best of our knowledge, Observatory is the first library that streamlines end-to-end inference of table embeddings at different levels and integrates many common practices of table serialization and input preprocessing.
Abstract: Transformer-based table embedding models have become prevalent for a wide range of applications involving tabular data. Such models require the serialization of a table as a sequence of tokens for model ingestion and embedding inference. Different downstream tasks require different kinds or levels of embeddings such as column or entity embeddings. Hence, various serialization and encoding methods have been proposed and implemented. Surprisingly, this conceptually simple process of creating table embeddings is not straightforward in practice for a few reasons: 1) a model may not natively expose a certain level of embedding; 2) choosing the correct table serialization and input preprocessing methods is difficult because there are many available; and 3) tables with a massive number of rows and columns cannot fit the input limit of models. In this work, we extend Observatory, a framework for characterizing embeddings of relational tables, by streamlining end-to-end inference of table embeddings, which eases the use of table embedding models in practice. The codebase of Observatory is publicly available at https://github.com/superctj/observatory.
Submission Number: 23