Joint Training of Product Detection and Recognition Using Task-Specific Datasets

Floris De Feyter, Toon Goedemé

Published: 2023, Last Modified: 27 Sept 2025VISIGRAPP (5: VISAPP) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Training a single model jointly for detection and recognition is typically done with a dataset that is fully annotated, i.e., the annotations consist of boxes with class labels. In the case of retail product detection and recognition, however, developing such a dataset is very expensive due to the large variety of products. It would be much more cost-efficient and scalable if we could employ two task-specific datasets: one detection-only and one recognition-only dataset. Unfortunately, experiments indicate a significant drop in performance when trained on task-specific data. Due to the potential cost savings, we are convinced that more research should be done on this matter and, therefore, we propose a set of training procedures that allows us to carefully investigate the differences between training with fully-annotated vs. task-specific data. We demonstrate this on a product detection and recognition dataset and as such reveal one of the core issues that is inherent to task-specifi

External IDs:dblp:conf/visapp/FeyterG23