Entity Exchange in the Wild: A Diagnostic Study of LLM Based Real-World Conversational Entity Extraction
Keywords: entity extraction, entity detection, llm, dialog systems
Abstract: Entity extraction from spoken customer–agent conversations is increasingly driving automation in contact centers. In these settings, extraction errors can trigger incorrect system actions, including database updates, verification failures, and unintended workflow execution. While prior work has examined the impact of transcription noise and cross-turn reasoning, it has not systematically analyzed how entity-exchange phenomena themselves shape extraction performance.
We model conversational entity exchange along three orthogonal axes: Initiation (how an entity becomes relevant in the dialogue), Evolution (how commitment to an entity’s value develops or changes across turns), and Articulation (how the final committed value is expressed in surface form).
We evaluate 16 large language models on 6,387 real-world customer–agent conversations spanning 12 entity types across numeric, alphanumeric, temporal, and free-text categories. Performance varies by as much as 50–60% within the same model depending solely on the underlying entity-exchange phenomena. The most severe failures occur when entity values are revised during the interaction and the model must distinguish intermediate mentions from the final committed value. Even in the absence of revision, digit-by-digit and encoded expressions remain persistent sources of error.
Error-Aware prompting improves extraction across all three axes, yielding average gains of up to 6.4% across models. Together, this work provides a structured framework for benchmarking entity extraction in real-world deployments and isolating systematic failure modes grounded in conversational structure.
Submission Type: Deployed
Copyright Form: pdf
Submission Number: 461
Loading