Keywords: Texture Recognition, Adaptive Learning, Intrinsic Features, Prototype Learning, Vision Transformer
Abstract: State-of-the-art texture recognition models often rely on cumbersome external memory banks and complex training pipelines. We challenge this paradigm by proposing a simple yet powerful alternative: learning from the rich intrinsic patterns within each image itself. We introduce STP-Former (Simple Texture Prototype Transformer), an architecture that dynamically distills a compact set of intrinsic prototypes for each input sample. A lightweight cross-attention module, the Texture Prototype Extractor (TPE), learns to identify and aggregate an image's most representative texture primitives on-the-fly. These adaptive prototypes, inherently aligned with the input's context, form a powerful basis for robust classification.Our contributions are twofold. First, we propose a decoupled two-stage training strategy where the TPE is pre-trained using a self-supervised objective to capture fundamental texture representations before a classifier is fine-tuned. Second, to endow the learned feature space with a robust geometric structure, we introduce a novel Supervised Topological Loss. Grounded in persistent homology, this objective directly optimizes for intra-class compactness and inter-class separation, pushing the boundaries of discriminability. This synergistic framework yields a remarkable performance leap; on the challenging DTD benchmark, STP-Former improves accuracy from 79% to over 86%. Our work demonstrates that an adaptive, self-contained approach provides a more effective and efficient paradigm for texture recognition.
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 16677
Loading