Towards Long-Text Entity Resolution with Chain-of-Thought Knowledge Augmentation from Large Language Models

Published: 01 Jan 2024, Last Modified: 17 Apr 2025DASFAA (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Entity resolution is a critical problem in data integration. Recently, approaches based on pre-trained language models have shown leading performance and have become the mainstream solution. When facing entities with long-text descriptions, considering that language models have limited input context length, existing approaches tend to use the syntax-based way, e.g., TF-IDF or auxiliary model to highlight the descriptions to be input into the matcher. However, such naive filtering approaches lack the interaction with the matching phase, thus may drop key information for calculating the semantic similarities and affect the final matching quality. To solve the problem of long-text entity resolution, we propose a novel framework called CoTer, which follows a chunk-then-aggregate architecture. CoTer firstly chunks the long-text descriptions to be input into the encoder to get the chunked representations. And then it implicitly highlights the semantically key information in chunked representations by injecting the Chain-of-Thought reasoning knowledge from a Large Language Model. Finally, CoTer fuses the chunked representations and reasoning knowledge in the decoder to output the matching probabilities. Extensive experiments show that CoTer demonstrates leading performance compared with state-of-the-art solutions.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview