On the Use of Visual Transformer for Image Complexity Assessment

Luigi Celona, Gianluigi Ciocca, Raimondo Schettini

Published: 2024, Last Modified: 05 Nov 2025VISIGRAPP (3): VISAPP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Perceiving image complexity is a crucial aspect of human visual understanding, yet explicitly assessing image complexity poses challenges. Historically, this aspect has been understudied due to its inherent subjectivity, stemming from its reliance on human perception, and the semantic dependency of image complexity in the face of diverse real-world images. Different computational models for image complexity estimation have been proposed in the literature. These models leverage a variety of techniques ranging from low-level, hand-crafted features, to advanced machine learning algorithms. This paper explores the use of recent deep-learning approaches based on Visual Transformer to extract robust information for image complexity estimation in a transfer learning paradigm. Specifically, we propose to leverage three visual backbones, CLIP, DINO-v2, and ImageNetViT, as feature extractors, coupled with a Support Vector Regressor with Radial Basis Function kernel as an image complexity estim