AMFMER: A multimodal full transformer for unifying aesthetic assessment tasks

Published: 01 Jan 2025, Last Modified: 25 Jul 2025Signal Process. Image Commun. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A novel end-to-end multimodal transformer framework is proposed for aesthetics prediction.•An multimodal fusion layer is proposed to reflect the complex relationships among multimodal features.•A new aesthetically oriented attention block is proposed for image transformer.•A new aesthetic comments dataset on Western painting is presented.
Loading