Does Collaborative Human–LM Dialogue Generation Help Information Extraction from Human–Human Dialogues?

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Research Area: Data, LMs and interactions
Keywords: Data Synthesis, Data Annotation, Human–LM Collaboration
TL;DR: The study introduces a human-in-the-loop dialogue generation framework that significantly improves task performance by synthesizing realistic call center dialogues, demonstrating the value of human-LM collaboration in data generation and annotation.
Abstract: The capabilities of pretrained language models (LMs) have opened opportunities to explore new application areas, but applications involving human-human interaction are limited by the fact that most data is protected from public release for privacy reasons. Problem-solving human-human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections, preventing successful domain transfer. To support information extraction (IE) for a private call center dataset (AIC), we introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues. In IE experiments with AIC dialogues, we observe 25% relative improvement in F1 after augmenting a small set of real human-human conversations with synthetic data. In controlled experiments, we compare training with our human-in-the-loop-synthesized data vs. fully automatically LM-generated data and find that collaborating humans adds value both in the generation and annotation stages. We release code and our synthetic dataset to illustrate the complexity of call center conversations and encourage development of complex dialogue datasets that are more representative of natural data.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1208
Loading