Abstract: Highlights•We propose a Multi-task Hierarchical Heterogeneous Fusion Framework for multimodal summarization.•Fine-grained semantics and cross-modality correlation is explored for summarization generation.•The proposed framework outperforms the baselines in overlap metric and diversity tests.
Loading