Enhancing multi-modal fusion in visual dialog via sample debiasing and feature interaction

Published: 01 Jan 2024, Last Modified: 30 Sept 2024Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a visual dialog model termed CS-PAF.•CS-PAF enhances fusion balance via counterfactual sample generation.•CS-PAF enhances fusion sufficiency by parallelly stacking fusion units.•Extensive experiments demonstrate the superiority of CS-PAF over other methods.
Loading