Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image Classification

Published: 25 Oct 2025, Last Modified: 29 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Multi-label image classification requires simultaneously recognizing multiple objects with complex interdependencies. While existing attention-based methods are prominent, their performance ishampered by two forms of representation entanglement: 1) Spatialentanglement, where contextual interference from backgroundsand co-occurring objects confuses specific object representations; 2)Semantic entanglement, where models overfit label co-occurrencepriors, thereby impairing a genuine semantic understanding of theimage. To address these challenges, we propose an Object-PurifiedRepresentation Learning framework. Concretely, for spatial entanglement, we propose the Spatial-wise Representation Purification Module that employs Spatial-Purified Attention to eliminate object-irrelevant feature activations for contextual interference reduction, combined with Spatial-Aware Supervision to enhance object perception capability. For semantic entanglement, we develop the Semantic-wise Association Purification Module that synergistically integrates our proposed average message with the original co-occurrence-based message. This design effectively modelsco-occurrence relationships while preventing their overemphasis. Furthermore, we design the Bidirectional Representation Refinement Module to efficiently enhance representations, further boosting classification performance. Extensive experiments on multiple benchmark datasets with different configurations demonstrate that our proposed method achieves state-of-the-art performance.
Loading