Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

ACL ARR 2024 June Submission111 Authors

05 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We describe an approach for aligning an LLM based dialogue agent for long-term social dialogue, where there is only a single global score given by the user at the end of the session. In this paper, we propose the usage of denser naturally-occurring multimodal communicative signals as local implicit feedback to improve the turn-level utterance generation. Therefore, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the RLHF pipeline to improve an LLM-based dialog agent. We run quantitative and qualitative human studies on two large-scale datasets to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: Dialogue and Interactive Systems, Multimodality and Language Grounding to Vision, Language Modeling
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 111
Loading