Entity Matching with Large Language Models as Weak and Strong Labellers

Diarmuid O'Reilly-Morgan, Elias Z. Tragos, Erika Duriakova, Honghui Du, Neil Hurley, Aonghus Lawlor

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ADBIS (Short Papers) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: A number of recent studies have shown that pre-trained large language models (LLMs) display highly competitive performance on entity matching tasks while acting in a zero shot manner, thus reducing the need for labelled training data. However, the use of the most capable LLMs (e.g. GPT-4) comes at a strongly prohibitive cost, whilst weaker LLMs (e.g. GPT-3) are significantly cheaper yet still provide reasonable performance without training data. In this study, we consider a scenario in which a budget constrained data practitioner with limited access to training data attempts to perform LLM based inference for an entity matching task. Within budget, they can afford to have only some portion of the data labelled by a stronger LLM. We cast this problem as that of deciding when to defer to a stronger LLM. In this scenario, we assume that a weak LLM provides cheap but noisy labels, while a strong LLM provides labels with better precision, but greatly increased cost. To address this, we develop a technique capable of intelligently allocating labels between strong and weak LLM labellers to maximise performance within the given budget. We show that, given a small amount of already labelled data, it is possible to reliably learn when the weak LLM is likely to be inaccurate, and thus defer to the stronger more expensive LLM. Employing GPT-3 and GPT-4 on four popular entity matching benchmarks we find that, given a specific budget to spend on the strong LLM, our approach generally performs better than a random labelling allocation, and outperforms an in-context-learning strategy. As such, this work outlines a simple yet effective methodology whereby real-world practitioners can employ LLMs in entity matching tasks.