‘No’ Matters: Out-of-Distribution Detection in Multimodality Multi-Turn Interactive Dialogue

‘No’ Matters: Out-of-Distribution Detection in Multimodality Multi-Turn Interactive Dialogue

ACL ARR 2025 February Submission772 Authors

11 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Out-of-distribution (OOD) detection in multimodal contexts is essential for identifying deviations in different modalities, particularly for interactive dialogue systems in real-life interactions, where the systems are usually infeasible to deploy large language models (LLMs) to generate dialogue responses due to data privacy and ethical issues. This paper aims to improve label detection that involves multi-round long dialogues by efficiently detecting OOD dialogues and images. We introduce a novel scoring framework named Dialogue Image Aligning and Enhancing Framework (DIAEF) that integrates the visual language models with the novel proposed scores that detect OOD in two key scenarios (1) mismatches between the dialogue and image input pair and (2) input pairs with previously unseen labels. Our experimental results, derived from various benchmarks, demonstrate that integrating image and multi-round dialogue OOD detection is more effective with previously unseen labels than using either modality independently. In the presence of mismatched pairs, our proposed score effectively identifies these mismatches and demonstrates strong robustness in long dialogues. This approach enhances domain-aware, adaptive conversational agents and establishes baselines for future studies.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: multi-modal dialogue systems; dialogue detection metrics; NLP in resource-constrained settings; image text matching; vision question answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 772

Loading