Multimodal Product Identification: Submission to Watch and Buy 2021 ChallengeOpen Website

2021 (modified: 30 Nov 2022)WAB @ ACM Multimedia 2021Readers: Everyone
Abstract: This technical report describes the overview of our approach to the "Watch and Buy: Multimodal Product Identification Challenge". Specifically, we tackle this problem with a three-stage framework, i.e., product detection, retrieval and classification. For the product detection, we leverage the performance by Cascade R-CNN and deformable convolution to alleviate the impact of image distortion. For the product retrieval, we enhance the Multiple Granularity Network (MGN) with global and local context through IBN, SE and Non-local blocks. The task of product classification suffers from fashion variation. To this end, we propose to fuse the global feature of the integral images and local feature of products. Experiments demonstrate that our works could achieve competitive performance with the state-of-the-art methods and our overall approach achieves a F1 score of 0.648, ranking the second place in the final challenge.
0 Replies

Loading