AMFMER: A multimodal full transformer for unifying aesthetic assessment tasks

Jin Qi, Can Su, Xiaoxuan Hu, Mengwei Chen, Yanfei Sun, Zhenjiang Dong, Tianliang Liu, Jiebo Luo

Published: 2025, Last Modified: 25 Jul 2025Signal Process. Image Commun. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A novel end-to-end multimodal transformer framework is proposed for aesthetics prediction.•An multimodal fusion layer is proposed to reflect the complex relationships among multimodal features.•A new aesthetically oriented attention block is proposed for image transformer.•A new aesthetic comments dataset on Western painting is presented.