Keywords: psychovisual, visual codes, frequency domain, Fourier transform, representation learning, deep learning, vision, human vision
TL;DR: We design a frequency-domain representation module for high-level semantic abstraction, enabling psychovisual processing in vision models.
Abstract: Visually semantic concepts such as objects and categories provide a natural foundation for structured reasoning, yet models like convolutional neural networks (CNNs) and transformers routinely extract and aggregate features using homogeneous stacks of spatial layers. These entangle feature extraction and reasoning, rendering decision-making processes opaque and difficult to interpret. Psychovisual processing provides a way to mimic how the brain encodes and interprets visual information that produces higher abstractions from low-level processing. In this paper, we propose Semantic Visual Coding (SVC), a learnt frequency domain representation that introduces explicit psychovisual abstraction into CNNs. Inspired by psychovisual codes from the 1990s, SVC learns band-limited filters that encode task-relevant semantics as distinct regions of the Discrete Fourier Transform (DFT). These converge towards sparse (data-driven) coronal patterns, suggesting a natural representation scheme for high-level features. We also introduce PsychoNet, a framework that adapts CNNs to make them psychovisually aware by combining traditional low-level feature extraction with frequency domain abstraction and reasoning via SVC. Salience analyses show that PsychoNet’s spatial layers extract highly interpretable object parts and morphological features, unlike blob-like regions produced by standard CNNs. Through tracing gradient flow, we find SVC likely leverages these parts to form abstract representations of semantic features of image categories, highlighting frequency domain abstraction as a compelling direction for interpretable model reasoning and semantic-based decision making.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18407
Loading