Product Entity Matching via Tabular DataOpen Website

Published: 01 Jan 2023, Last Modified: 04 Dec 2023CIKM 2023Readers: Everyone
Abstract: Product Entity Matching (PEM)--a subfield of record linkage that focuses on linking records that refer to the same product--is a challenging task for many entity matching models. For example, recent transformer models report a near-perfect performance score on many datasets while their performance is the lowest on PEM datasets. In this paper, we study PEM under the common setting where the information is spread over text and tables. We show that adding tables can enrich the existing PEM datasets and those tables can act as a bridge between the entities being matched. We also propose TATEM, an effective solution that leverages Pre-trained Language Models (PLMs) with a novel serialization technique to encode tabular product data and an attribute ranking module to make our model more data-efficient. Our experiments on both current benchmark datasets and our proposed datasets show significant improvements compared to state-of-the-art methods, including Large Language Models (LLMs) in zero-shot and few-shot settings.
0 Replies

Loading