Keywords: Artwork Analysis · Multimodal Learning · Knowledge Graphs · Structured Context
TL;DR: ArtRAG: Structured Context Retrieval-Augmented Framework for Artwork Explanation Generation
Abstract: Generating detailed descriptions for paintings is a complex challenge combining computer vision and natural language processing, as it requires blending visual analysis with cultural, artistic, and historical contexts. Unlike general image captioning, artwork explanations demand deeper contextual integration. Existing models often rely on static attributes like artist or style, which fail to address the complexity of art. To overcome these limitations, we present ArtRAG, a novel framework integrating context-aware knowledge into a Retrieval-Augmented Generation (RAG) pipeline. ArtRAG constructs a rich knowledge graph from art-related texts, mapping entities like artists, movements, and historical, and cultural events, along with their relationships, with descriptions on all elements. During inference, the framework retrieves relevant graph elements to dynamically incorporate cultural, historical, and stylistic insights. Evaluated on SemART and Artpedia datasets, ArtRAG generates rich, multi-perspective descriptions without requiring any training, achieving competitive performance compared to state-of-the-art models while bridging the gap between visual recognition and contextual understanding.
Submission Number: 2
Loading