Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
Abstract: Highlights•Novel VIKDF approach enhances dialogue generation with visual implicit knowledge.•Implicit Query Transformer effectively distills visual information into LLMs.•Bidirectional Variational Fusion integrates visual implicit knowledge seamlessly.•VIKDF outperforms the SOTA models in zero-resource dialogue scenarios.
Loading