AesthetiX-RAG: Causally-Grounded Emotion Recognition and Explanation in Paintings via Artist–Style Knowledge and Faithful Visual Evidence
Abstract: Art provides a visual medium for emotional expression. In paintings, such expression is conveyed through compositional structure, symbolic elements, and stylistic features. However, existing computational methods for understanding artwork often leverage semantic content and low-level visual features. Consequently, these methods may provide a limited representation of emotional expression embedded in stylistic and compositional features. In this work, we present AesthetiX-RAG, a causally grounded retrieval-augmented framework for emotion recognition and explanation in paintings. The proposed framework employs an Artist–Style–Motif–Emotion (ASME) graph to model relationships among artists, stylistic traditions, symbolic motifs, and emotional expression. The artist–style priors derived from ASME are projected into control tokens and fused with visual representations through MultiHead Attention. The fused representation is used to predict the emotion label. Finally, the retrieval-augmented generator combines the emotion label with faithful visual evidence and retrieved artist–style knowledge to generate a grounded natural-language explanation for the predicted emotion. We also introduce a new dataset AesthetiX-5K, to support emotion recognition and explanation in paintings. The dataset contains 5116 paintings comprising 27 artistic styles, 23 artists, and 10 genres, with each sample annotated with an emotion label and a human-written rationale. Detailed experimental analysis on AesthetiX-5K and existing art-emotion datasets validates the effectiveness of the proposed framework. The code and dataset will be made publicly available.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tatsuya_Harada1
Submission Number: 9805
Loading