On Short Textual Value Column Representation Using Symbol Level Language Models

Published: 10 Oct 2024, Last Modified: 26 Oct 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Symbol Level Language Models, Column Matching
TL;DR: Presenting a symbol level language model-based representation that is efficient for string-type database columns and effective for tasks like column matching.
Abstract: String-type database columns containing short textual values are crucial for storing and managing a wide range of information in various applications. For example, they store categories, labels, enumerations, code, and abbreviations. Here, we discuss a string column representation using symbol level language models that grasps the symbol level ``distribution'' of the column textual values. These language models are known for their good prediction quality, memory-footprint and runtime efficiency, while being theoretically justified. We focus on a column matching application, and provide empirical indication for their usefulness.
Submission Number: 52
Loading