Abstract: Our goal is to promote an effective image aesthetics assessment (IAA) model. In the current Internet era, it has become easier to obtain the text description of an image. With the dual-modal support of image and text, the image aesthetics assessment model will further reflect its superiority. To this end, we design a multimodal feature-driven guided image aesthetics assessment model (MFD). Firstly, multi-modal features are extracted through the feature extraction sub-network, including image-driven aesthetic features and content features, as well as text-driven semantic features. Each feature captures the implicit characteristics of different levels of human brain object analysis. Secondly, these multi-modal features are combined to form multi-modal combination features that contain multiple characteristics. Finally, the obtained multi-modal are combined for aesthetic assessment prediction. Experimental results on public image aesthetics assessment databases demonstrate the superiority of our model.
Loading