Abstract: Highlights•Social images have multimodal properties, encompassing visual content, textual descriptions, and social relationships.•There is a hierarchical relationship between content within individual images and the social relationships among different images.•We propose a novel Hierarchical Heterogeneous Graph Neural Network to exploit the hierarchical relationship between diverse modalities.•Our approach can capture fine-grained correlation within the image and heterogeneous relationships among the images.•Experimental results demonstrate the superiority of our proposal for networkoriented multimodal tasks.
Loading