Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

ACL ARR 2024 June Submission111 Authors

05 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We describe an approach for aligning an LLM based dialogue agent for long-term social dialogue, where there is only a single global score given by the user at the end of the session. In this paper, we propose the usage of denser naturally-occurring multimodal communicative signals as local implicit feedback to improve the turn-level utterance generation. Therefore, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the RLHF pipeline to improve an LLM-based dialog agent. We run quantitative and qualitative human studies on two large-scale datasets to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: Dialogue and Interactive Systems, Multimodality and Language Grounding to Vision, Language Modeling

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 111

Loading