Keywords: psychovisual, visual codes, frequency domain, Fourier transform, representation learning, deep learning, vision, human vision
TL;DR: We design a frequency-domain representation module for high-level semantic abstraction, enabling psychovisual processing in vision models.
Abstract: Visually semantic concepts such as objects and categories provide a natural foundation for semantic reasoning, yet standard deep learning-based vision models routinely extract and aggregate features using homogeneous stacks of spatial layers. As a result, feature representations are learnt implicitly without clear organisation, rendering decision-making processes opaque and difficult to interpret. Psychovisual processing provides a way to mimic how the brain encodes and interprets visual information that produces higher abstractions from low-level processing. In this paper, we propose Semantic Visual Coding (SVC), a learnt frequency domain representation that introduces explicit psychovisual abstraction into convolutional neural networks (CNNs). Inspired by psychovisually motivated image codes from the 1990s, SVC learns band-limited filters that encode task-relevant semantics as distinct regions of the frequency domain. These converge towards sparse (data-driven) coronal patterns that suggest a natural representation scheme for semantic abstractions supporting model reasoning. We also introduce a framework that adapts CNNs to be psychovisually aware by combining traditional low-level spatial feature extraction with high-level abstraction in the frequency domain via SVC, which we call 'PsychoNet'. Salience analyses show that PsychoNet’s spatial layers extract highly interpretable object parts and morphological features, unlike blob-like regions produced by standard CNN. It further finds that SVC forms structured selections of these parts that are organised by spatial scale, suggesting frequency domain abstraction as a promising direction for interpretable models which reveal the semantic features they employ.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18407
Loading