Real-time large-scale visual concept detection with linear classifiers

Mats Sjöberg, Markus Koskela, Satoru Ishikawa, Jorma Laaksonen

Published: 2012, Last Modified: 11 Nov 2024ICPR 2012EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Many emerging application areas in video and image processing require real-time or faster visual concept detection. Examples include indexing of online user-generated video content and 24/7 archiving of TV broadcasts. The current state-of-the-art in concept detection uses bag-of-visual-words features with computationally heavy kernel-based classifiers. We argue that this approach is not feasible for real-time applications, and propose instead to use combinations of fast linear classifiers. In experiments with the large-scale TRECVID 2011 video database and 50 concepts, we compare several methods to improve the retrieval performance of standard linear classifiers. Fusing classifiers trained on different features and using multi-learn and homogeneous kernel maps achieve state-of-the-art retrieval precision, while retaining real-time performance even for large sets of concepts.