Entity Exchange in the Wild: A Diagnostic Study of LLM Based Real-World Conversational Entity Extraction

Soumya Jain; Ayush Kumar

Entity Exchange in the Wild: A Diagnostic Study of LLM Based Real-World Conversational Entity Extraction

Soumya Jain, Ayush Kumar

Published: 18 Apr 2026, Last Modified: 26 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: entity extraction, entity detection, llm, dialog systems

Abstract: Entity extraction from spoken customer–agent conversations is increasingly driving automation in contact centers. In these settings, extraction errors can trigger incorrect system actions, including database updates, verification failures, and unintended workflow execution. While prior work has examined the impact of transcription noise and cross-turn reasoning, it has not systematically analyzed how entity-exchange phenomena themselves shape extraction performance. We model conversational entity exchange along three orthogonal axes: Initiation (how an entity becomes relevant in the dialogue), Evolution (how commitment to an entity’s value develops or changes across turns), and Articulation (how the final committed value is expressed in surface form). We evaluate 16 large language models on 6,387 real-world customer–agent conversations spanning 12 entity types across numeric, alphanumeric, temporal, and free-text categories. Performance varies by as much as 50–60% within the same model depending solely on the underlying entity-exchange phenomena. The most severe failures occur when entity values are revised during the interaction and the model must distinguish intermediate mentions from the final committed value. Even in the absence of revision, digit-by-digit and encoded expressions remain persistent sources of error. Error-Aware prompting improves extraction across all three axes, yielding average gains of up to 6.4% across models. Together, this work provides a structured framework for benchmarking entity extraction in real-world deployments and isolating systematic failure modes grounded in conversational structure.

Submission Type: Deployed

Copyright Form: pdf

Submission Number: 461

Loading