Abstract: Fine-grained leaf image retrieval (FGLIR) is a new unsupervised pattern recognition task in content-based image retrieval (CBIR). It aims to distinguish varieties/cultivars of leaf images within a certain plant species and is more challenging than general leaf image retrieval task due to the inherently subtle differences across different cultivars. In this study, we for the first time investigate the possible way to mine the spatial structure and contextual information from the activation of the convolutional layers of CNN networks for FGLIR. For achieving this goal, we design a novel geometrical structure, named Triplet Patch-Pairs Composite Structure (TPCS), consisting of three symmetric patch pairs segmented from the leaf images in different orientations. We extract CNN feature map for each patch in TPCS and measure the difference between the feature maps of the patch pair for constructing local deep self-similarity descriptor. By varying the size of the TPCS, we can yield multi-scale deep self-similarity descriptors. The final aggregated local deep self-similarity descriptors, named Structural Deep Patch Representation (SDePR), not only encode the spatial structure and contextual information of leaf images in deep feature domain, but also are invariant to the geometrical transformations. The extensive experiments of applying our SDEPR method to the public challenging FGLIR tasks show that our method outperforms the state-of-the-art handcrafted visual features and deep retrieval models.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: This work focuses on using deep learning techniques to address fine-grained leaf image retireval which is a new challenging issue in the research community of content-based image retrieval (CBIR).
Submission Number: 3815
Loading